Server Admin Log
Appearance
2024-10-16
- 06:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T376905)', diff saved to https://phabricator.wikimedia.org/P70122 and previous config saved to /var/cache/conftool/dbconfig/20241016-063940-ladsgroup.json
- 06:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70121 and previous config saved to /var/cache/conftool/dbconfig/20241016-063210-ladsgroup.json
- 06:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T376905)', diff saved to https://phabricator.wikimedia.org/P70120 and previous config saved to /var/cache/conftool/dbconfig/20241016-063132-ladsgroup.json
- 06:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance
- 06:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance
- 06:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T376905)', diff saved to https://phabricator.wikimedia.org/P70119 and previous config saved to /var/cache/conftool/dbconfig/20241016-063107-ladsgroup.json
- 06:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T371742)', diff saved to https://phabricator.wikimedia.org/P70118 and previous config saved to /var/cache/conftool/dbconfig/20241016-061703-ladsgroup.json
- 06:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P70117 and previous config saved to /var/cache/conftool/dbconfig/20241016-061558-ladsgroup.json
- 06:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P70116 and previous config saved to /var/cache/conftool/dbconfig/20241016-060051-ladsgroup.json
- 05:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T376905)', diff saved to https://phabricator.wikimedia.org/P70115 and previous config saved to /var/cache/conftool/dbconfig/20241016-054544-ladsgroup.json
- 05:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T376905)', diff saved to https://phabricator.wikimedia.org/P70114 and previous config saved to /var/cache/conftool/dbconfig/20241016-053943-ladsgroup.json
- 05:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
- 05:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
- 05:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T376905)', diff saved to https://phabricator.wikimedia.org/P70113 and previous config saved to /var/cache/conftool/dbconfig/20241016-053918-ladsgroup.json
- 05:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P70112 and previous config saved to /var/cache/conftool/dbconfig/20241016-052411-ladsgroup.json
- 05:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P70111 and previous config saved to /var/cache/conftool/dbconfig/20241016-050904-ladsgroup.json
- 04:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T376905)', diff saved to https://phabricator.wikimedia.org/P70110 and previous config saved to /var/cache/conftool/dbconfig/20241016-045356-ladsgroup.json
- 04:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T376905)', diff saved to https://phabricator.wikimedia.org/P70109 and previous config saved to /var/cache/conftool/dbconfig/20241016-044657-ladsgroup.json
- 04:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
- 04:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
- 04:42 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
- 04:42 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
- 04:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T376905)', diff saved to https://phabricator.wikimedia.org/P70108 and previous config saved to /var/cache/conftool/dbconfig/20241016-044204-ladsgroup.json
- 04:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T371742)', diff saved to https://phabricator.wikimedia.org/P70107 and previous config saved to /var/cache/conftool/dbconfig/20241016-043757-ladsgroup.json
- 04:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
- 04:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
- 04:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T371742)', diff saved to https://phabricator.wikimedia.org/P70106 and previous config saved to /var/cache/conftool/dbconfig/20241016-043734-ladsgroup.json
- 04:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P70105 and previous config saved to /var/cache/conftool/dbconfig/20241016-042657-ladsgroup.json
- 04:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P70104 and previous config saved to /var/cache/conftool/dbconfig/20241016-042227-ladsgroup.json
- 04:22 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 04:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for new frack devices - pt1979@cumin2002"
- 04:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for new frack devices - pt1979@cumin2002"
- 04:18 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 04:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P70103 and previous config saved to /var/cache/conftool/dbconfig/20241016-041150-ladsgroup.json
- 04:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P70102 and previous config saved to /var/cache/conftool/dbconfig/20241016-040721-ladsgroup.json
- 04:05 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 04:05 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for new frack devices - pt1979@cumin2002"
- 04:05 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for new frack devices - pt1979@cumin2002"
- 04:01 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 03:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T376905)', diff saved to https://phabricator.wikimedia.org/P70101 and previous config saved to /var/cache/conftool/dbconfig/20241016-035643-ladsgroup.json
- 03:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T371742)', diff saved to https://phabricator.wikimedia.org/P70100 and previous config saved to /var/cache/conftool/dbconfig/20241016-035214-ladsgroup.json
- 03:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1244 (T376905)', diff saved to https://phabricator.wikimedia.org/P70099 and previous config saved to /var/cache/conftool/dbconfig/20241016-034932-ladsgroup.json
- 03:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance
- 03:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance
- 03:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T376905)', diff saved to https://phabricator.wikimedia.org/P70098 and previous config saved to /var/cache/conftool/dbconfig/20241016-034907-ladsgroup.json
- 03:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P70097 and previous config saved to /var/cache/conftool/dbconfig/20241016-033400-ladsgroup.json
- 03:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P70096 and previous config saved to /var/cache/conftool/dbconfig/20241016-031852-ladsgroup.json
- 03:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T376905)', diff saved to https://phabricator.wikimedia.org/P70095 and previous config saved to /var/cache/conftool/dbconfig/20241016-030345-ladsgroup.json
- 02:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T376905)', diff saved to https://phabricator.wikimedia.org/P70094 and previous config saved to /var/cache/conftool/dbconfig/20241016-025633-ladsgroup.json
- 02:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
- 02:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
- 02:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T376905)', diff saved to https://phabricator.wikimedia.org/P70093 and previous config saved to /var/cache/conftool/dbconfig/20241016-025608-ladsgroup.json
- 02:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P70092 and previous config saved to /var/cache/conftool/dbconfig/20241016-024101-ladsgroup.json
- 02:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P70091 and previous config saved to /var/cache/conftool/dbconfig/20241016-022554-ladsgroup.json
- 02:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T371742)', diff saved to https://phabricator.wikimedia.org/P70090 and previous config saved to /var/cache/conftool/dbconfig/20241016-021358-ladsgroup.json
- 02:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
- 02:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
- 02:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T371742)', diff saved to https://phabricator.wikimedia.org/P70089 and previous config saved to /var/cache/conftool/dbconfig/20241016-021347-ladsgroup.json
- 02:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T376905)', diff saved to https://phabricator.wikimedia.org/P70088 and previous config saved to /var/cache/conftool/dbconfig/20241016-021047-ladsgroup.json
- 02:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T376905)', diff saved to https://phabricator.wikimedia.org/P70087 and previous config saved to /var/cache/conftool/dbconfig/20241016-020333-ladsgroup.json
- 02:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
- 02:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
- 02:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T376905)', diff saved to https://phabricator.wikimedia.org/P70086 and previous config saved to /var/cache/conftool/dbconfig/20241016-020308-ladsgroup.json
- 01:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P70085 and previous config saved to /var/cache/conftool/dbconfig/20241016-015840-ladsgroup.json
- 01:50 eileen: tools upgraded from 62f2d170 to 68f64e43
- 01:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P70084 and previous config saved to /var/cache/conftool/dbconfig/20241016-014801-ladsgroup.json
- 01:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P70083 and previous config saved to /var/cache/conftool/dbconfig/20241016-014333-ladsgroup.json
- 01:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P70082 and previous config saved to /var/cache/conftool/dbconfig/20241016-013254-ladsgroup.json
- 01:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T371742)', diff saved to https://phabricator.wikimedia.org/P70081 and previous config saved to /var/cache/conftool/dbconfig/20241016-012826-ladsgroup.json
- 01:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T376905)', diff saved to https://phabricator.wikimedia.org/P70080 and previous config saved to /var/cache/conftool/dbconfig/20241016-011747-ladsgroup.json
- 01:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T376905)', diff saved to https://phabricator.wikimedia.org/P70079 and previous config saved to /var/cache/conftool/dbconfig/20241016-011036-ladsgroup.json
- 01:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
- 01:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
- 01:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T376905)', diff saved to https://phabricator.wikimedia.org/P70078 and previous config saved to /var/cache/conftool/dbconfig/20241016-011010-ladsgroup.json
- 00:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P70077 and previous config saved to /var/cache/conftool/dbconfig/20241016-005500-ladsgroup.json
- 00:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P70076 and previous config saved to /var/cache/conftool/dbconfig/20241016-003953-ladsgroup.json
- 00:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T376905)', diff saved to https://phabricator.wikimedia.org/P70075 and previous config saved to /var/cache/conftool/dbconfig/20241016-002446-ladsgroup.json
- 00:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1238 (T376905)', diff saved to https://phabricator.wikimedia.org/P70074 and previous config saved to /var/cache/conftool/dbconfig/20241016-001629-ladsgroup.json
- 00:16 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Maintenance
- 00:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Maintenance
- 00:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T376905)', diff saved to https://phabricator.wikimedia.org/P70073 and previous config saved to /var/cache/conftool/dbconfig/20241016-001604-ladsgroup.json
- 00:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P70072 and previous config saved to /var/cache/conftool/dbconfig/20241016-000057-ladsgroup.json
2024-10-15
- 23:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T371742)', diff saved to https://phabricator.wikimedia.org/P70071 and previous config saved to /var/cache/conftool/dbconfig/20241015-235055-ladsgroup.json
- 23:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 23:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 23:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
- 23:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
- 23:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T371742)', diff saved to https://phabricator.wikimedia.org/P70070 and previous config saved to /var/cache/conftool/dbconfig/20241015-235017-ladsgroup.json
- 23:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P70069 and previous config saved to /var/cache/conftool/dbconfig/20241015-234550-ladsgroup.json
- 23:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P70068 and previous config saved to /var/cache/conftool/dbconfig/20241015-233510-ladsgroup.json
- 23:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T376905)', diff saved to https://phabricator.wikimedia.org/P70067 and previous config saved to /var/cache/conftool/dbconfig/20241015-233043-ladsgroup.json
- 23:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T376905)', diff saved to https://phabricator.wikimedia.org/P70066 and previous config saved to /var/cache/conftool/dbconfig/20241015-232456-ladsgroup.json
- 23:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 23:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 23:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
- 23:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
- 23:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T376905)', diff saved to https://phabricator.wikimedia.org/P70065 and previous config saved to /var/cache/conftool/dbconfig/20241015-232423-ladsgroup.json
- 23:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P70064 and previous config saved to /var/cache/conftool/dbconfig/20241015-232003-ladsgroup.json
- 23:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P70063 and previous config saved to /var/cache/conftool/dbconfig/20241015-230916-ladsgroup.json
- 23:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T371742)', diff saved to https://phabricator.wikimedia.org/P70062 and previous config saved to /var/cache/conftool/dbconfig/20241015-230456-ladsgroup.json
- 22:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P70061 and previous config saved to /var/cache/conftool/dbconfig/20241015-225409-ladsgroup.json
- 22:48 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 22:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T376905)', diff saved to https://phabricator.wikimedia.org/P70060 and previous config saved to /var/cache/conftool/dbconfig/20241015-223902-ladsgroup.json
- 22:38 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 22:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T376905)', diff saved to https://phabricator.wikimedia.org/P70059 and previous config saved to /var/cache/conftool/dbconfig/20241015-222936-ladsgroup.json
- 22:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
- 22:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
- 22:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T376905)', diff saved to https://phabricator.wikimedia.org/P70058 and previous config saved to /var/cache/conftool/dbconfig/20241015-222911-ladsgroup.json
- 22:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1222.eqiad.wmnet with reason: Maintenance
- 22:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1222.eqiad.wmnet with reason: Maintenance
- 22:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1189.eqiad.wmnet with reason: Maintenance
- 22:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1189.eqiad.wmnet with reason: Maintenance
- 22:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P70057 and previous config saved to /var/cache/conftool/dbconfig/20241015-221404-ladsgroup.json
- 22:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T370903)', diff saved to https://phabricator.wikimedia.org/P70056 and previous config saved to /var/cache/conftool/dbconfig/20241015-221356-ladsgroup.json
- 22:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P70055 and previous config saved to /var/cache/conftool/dbconfig/20241015-220316-ladsgroup.json
- 21:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P70054 and previous config saved to /var/cache/conftool/dbconfig/20241015-215857-ladsgroup.json
- 21:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P70053 and previous config saved to /var/cache/conftool/dbconfig/20241015-215849-ladsgroup.json
- 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
- 21:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P70052 and previous config saved to /var/cache/conftool/dbconfig/20241015-214811-ladsgroup.json
- 21:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T376905)', diff saved to https://phabricator.wikimedia.org/P70051 and previous config saved to /var/cache/conftool/dbconfig/20241015-214350-ladsgroup.json
- 21:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P70050 and previous config saved to /var/cache/conftool/dbconfig/20241015-214342-ladsgroup.json
- 21:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T376905)', diff saved to https://phabricator.wikimedia.org/P70049 and previous config saved to /var/cache/conftool/dbconfig/20241015-213423-ladsgroup.json
- 21:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
- 21:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
- 21:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P70048 and previous config saved to /var/cache/conftool/dbconfig/20241015-213305-ladsgroup.json
- 21:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T371742)', diff saved to https://phabricator.wikimedia.org/P70047 and previous config saved to /var/cache/conftool/dbconfig/20241015-213227-ladsgroup.json
- 21:32 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
- 21:32 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
- 21:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T371742)', diff saved to https://phabricator.wikimedia.org/P70046 and previous config saved to /var/cache/conftool/dbconfig/20241015-213203-ladsgroup.json
- 21:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 21:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 21:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T370903)', diff saved to https://phabricator.wikimedia.org/P70045 and previous config saved to /var/cache/conftool/dbconfig/20241015-212835-ladsgroup.json
- 21:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2205.codfw.wmnet with reason: Sad
- 21:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2205.codfw.wmnet with reason: Sad
- 21:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T370903)', diff saved to https://phabricator.wikimedia.org/P70044 and previous config saved to /var/cache/conftool/dbconfig/20241015-212431-ladsgroup.json
- 21:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
- 21:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
- 21:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P70043 and previous config saved to /var/cache/conftool/dbconfig/20241015-211800-ladsgroup.json
- 21:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P70042 and previous config saved to /var/cache/conftool/dbconfig/20241015-211656-ladsgroup.json
- 21:04 cjming: end of UTC late backport window
- 21:04 cjming@deploy2002: Finished scap sync-world: Backport for SkinComponentCopyright: Fix message existence check for history-copyright (T45646) (duration: 06m 51s)
- 21:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P70041 and previous config saved to /var/cache/conftool/dbconfig/20241015-210149-ladsgroup.json
- 20:59 cjming@deploy2002: cjming, matmarex: Continuing with sync
- 20:59 cjming@deploy2002: cjming, matmarex: Backport for SkinComponentCopyright: Fix message existence check for history-copyright (T45646) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:57 ladsgroup@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2194.codfw.wmnet onto db2205.codfw.wmnet
- 20:57 cjming@deploy2002: Started scap sync-world: Backport for SkinComponentCopyright: Fix message existence check for history-copyright (T45646)
- 20:56 cjming@deploy2002: Finished scap sync-world: Backport for Redirect all namespace-in-Wikipedia cases to Wikipedia (T376923) (duration: 12m 33s)
- 20:51 cjming@deploy2002: cjming, pppery: Continuing with sync
- 20:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T371742)', diff saved to https://phabricator.wikimedia.org/P70040 and previous config saved to /var/cache/conftool/dbconfig/20241015-204642-ladsgroup.json
- 20:46 cjming@deploy2002: cjming, pppery: Backport for Redirect all namespace-in-Wikipedia cases to Wikipedia (T376923) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:43 cjming@deploy2002: Started scap sync-world: Backport for Redirect all namespace-in-Wikipedia cases to Wikipedia (T376923)
- 20:42 cjming@deploy2002: Finished scap sync-world: Backport for Missing.php: Improve detection of interwikis in certain cases (T363538) (duration: 08m 50s)
- 20:37 cjming@deploy2002: cjming, pppery: Continuing with sync
- 20:35 cjming@deploy2002: cjming, pppery: Backport for Missing.php: Improve detection of interwikis in certain cases (T363538) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:33 cjming@deploy2002: Started scap sync-world: Backport for Missing.php: Improve detection of interwikis in certain cases (T363538)
- 20:31 cjming@deploy2002: Finished scap sync-world: Backport for contactpages: Move stewards contactpage to MetaContactPages.php (duration: 10m 56s)
- 20:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
- 20:27 cjming@deploy2002: ammarpad, cjming: Continuing with sync
- 20:23 cjming@deploy2002: ammarpad, cjming: Backport for contactpages: Move stewards contactpage to MetaContactPages.php synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:20 cjming@deploy2002: Started scap sync-world: Backport for contactpages: Move stewards contactpage to MetaContactPages.php
- 20:16 cjming@deploy2002: Finished scap sync-world: Backport for Remove legacy UI actions tracking (T376065) (duration: 12m 28s)
- 20:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
- 20:12 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 20:12 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 20:11 cjming@deploy2002: ksarabia, cjming: Continuing with sync
- 20:11 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 20:10 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 20:10 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 20:09 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 20:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye
- 20:08 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2081.codfw.wmnet with OS bullseye
- 20:07 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 20:07 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 20:06 cjming@deploy2002: ksarabia, cjming: Backport for Remove legacy UI actions tracking (T376065) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:05 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 20:04 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 20:04 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 20:03 cjming@deploy2002: Started scap sync-world: Backport for Remove legacy UI actions tracking (T376065)
- 20:03 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 20:02 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 20:01 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 20:00 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 19:59 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 19:56 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 19:56 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 19:16 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.27 refs T375658
- 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2162 (T371742)', diff saved to https://phabricator.wikimedia.org/P70039 and previous config saved to /var/cache/conftool/dbconfig/20241015-191345-ladsgroup.json
- 19:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
- 19:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
- 19:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T371742)', diff saved to https://phabricator.wikimedia.org/P70038 and previous config saved to /var/cache/conftool/dbconfig/20241015-191322-ladsgroup.json
- 19:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T367781)', diff saved to https://phabricator.wikimedia.org/P70037 and previous config saved to /var/cache/conftool/dbconfig/20241015-190231-arnaudb.json
- 18:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P70036 and previous config saved to /var/cache/conftool/dbconfig/20241015-185814-ladsgroup.json
- 18:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
- 18:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
- 18:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye
- 18:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:48 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P70035 and previous config saved to /var/cache/conftool/dbconfig/20241015-184724-arnaudb.json
- 18:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P70034 and previous config saved to /var/cache/conftool/dbconfig/20241015-184307-ladsgroup.json
- 18:42 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:42 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:41 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:40 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:40 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:39 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2082
- 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2081
- 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2083
- 18:37 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2083
- 18:37 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2082
- 18:36 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2081
- 18:36 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:35 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2081-3 to codfw - jhancock@cumin2002"
- 18:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2081-3 to codfw - jhancock@cumin2002"
- 18:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P70033 and previous config saved to /var/cache/conftool/dbconfig/20241015-183218-arnaudb.json
- 18:31 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 18:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T371742)', diff saved to https://phabricator.wikimedia.org/P70032 and previous config saved to /var/cache/conftool/dbconfig/20241015-182800-ladsgroup.json
- 18:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T376905)', diff saved to https://phabricator.wikimedia.org/P70031 and previous config saved to /var/cache/conftool/dbconfig/20241015-181930-ladsgroup.json
- 18:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T367781)', diff saved to https://phabricator.wikimedia.org/P70030 and previous config saved to /var/cache/conftool/dbconfig/20241015-181711-arnaudb.json
- 18:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T367781)', diff saved to https://phabricator.wikimedia.org/P70029 and previous config saved to /var/cache/conftool/dbconfig/20241015-181455-arnaudb.json
- 18:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2216.codfw.wmnet with reason: Maintenance
- 18:15 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2216.codfw.wmnet with reason: Maintenance
- 18:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T367781)', diff saved to https://phabricator.wikimedia.org/P70028 and previous config saved to /var/cache/conftool/dbconfig/20241015-181433-arnaudb.json
- 18:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P70027 and previous config saved to /var/cache/conftool/dbconfig/20241015-180423-ladsgroup.json
- 17:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P70026 and previous config saved to /var/cache/conftool/dbconfig/20241015-175926-arnaudb.json
- 17:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P70025 and previous config saved to /var/cache/conftool/dbconfig/20241015-174916-ladsgroup.json
- 17:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P70024 and previous config saved to /var/cache/conftool/dbconfig/20241015-174419-arnaudb.json
- 17:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T376905)', diff saved to https://phabricator.wikimedia.org/P70023 and previous config saved to /var/cache/conftool/dbconfig/20241015-173409-ladsgroup.json
- 17:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T367781)', diff saved to https://phabricator.wikimedia.org/P70022 and previous config saved to /var/cache/conftool/dbconfig/20241015-172912-arnaudb.json
- 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1236 (T376905)', diff saved to https://phabricator.wikimedia.org/P70021 and previous config saved to /var/cache/conftool/dbconfig/20241015-172714-ladsgroup.json
- 17:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
- 17:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2212 (T367781)', diff saved to https://phabricator.wikimedia.org/P70020 and previous config saved to /var/cache/conftool/dbconfig/20241015-172657-arnaudb.json
- 17:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
- 17:27 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2212.codfw.wmnet with reason: Maintenance
- 17:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T376905)', diff saved to https://phabricator.wikimedia.org/P70019 and previous config saved to /var/cache/conftool/dbconfig/20241015-172648-ladsgroup.json
- 17:27 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2212.codfw.wmnet with reason: Maintenance
- 17:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2202.codfw.wmnet with reason: Maintenance
- 17:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2202.codfw.wmnet with reason: Maintenance
- 17:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T367781)', diff saved to https://phabricator.wikimedia.org/P70018 and previous config saved to /var/cache/conftool/dbconfig/20241015-172610-arnaudb.json
- 17:13 swfrench@deploy2002: Finished scap sync-world: Testing scap after mediawiki-deployments.yaml format change - T370934 (duration: 02m 47s)
- 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P70017 and previous config saved to /var/cache/conftool/dbconfig/20241015-171141-ladsgroup.json
- 17:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P70016 and previous config saved to /var/cache/conftool/dbconfig/20241015-171103-arnaudb.json
- 17:10 swfrench@deploy2002: Started scap sync-world: Testing scap after mediawiki-deployments.yaml format change - T370934
- 16:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P70015 and previous config saved to /var/cache/conftool/dbconfig/20241015-165634-ladsgroup.json
- 16:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2161 (T371742)', diff saved to https://phabricator.wikimedia.org/P70014 and previous config saved to /var/cache/conftool/dbconfig/20241015-165608-ladsgroup.json
- 16:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P70013 and previous config saved to /var/cache/conftool/dbconfig/20241015-165556-arnaudb.json
- 16:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
- 16:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
- 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T371742)', diff saved to https://phabricator.wikimedia.org/P70012 and previous config saved to /var/cache/conftool/dbconfig/20241015-165539-ladsgroup.json
- 16:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T376905)', diff saved to https://phabricator.wikimedia.org/P70011 and previous config saved to /var/cache/conftool/dbconfig/20241015-164127-ladsgroup.json
- 16:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T367781)', diff saved to https://phabricator.wikimedia.org/P70010 and previous config saved to /var/cache/conftool/dbconfig/20241015-164050-arnaudb.json
- 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P70009 and previous config saved to /var/cache/conftool/dbconfig/20241015-164032-ladsgroup.json
- 16:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T367781)', diff saved to https://phabricator.wikimedia.org/P70008 and previous config saved to /var/cache/conftool/dbconfig/20241015-163834-arnaudb.json
- 16:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2188.codfw.wmnet with reason: Maintenance
- 16:38 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2188.codfw.wmnet with reason: Maintenance
- 16:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T367781)', diff saved to https://phabricator.wikimedia.org/P70007 and previous config saved to /var/cache/conftool/dbconfig/20241015-163812-arnaudb.json
- 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T376905)', diff saved to https://phabricator.wikimedia.org/P70006 and previous config saved to /var/cache/conftool/dbconfig/20241015-163419-ladsgroup.json
- 16:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
- 16:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
- 16:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T376905)', diff saved to https://phabricator.wikimedia.org/P70005 and previous config saved to /var/cache/conftool/dbconfig/20241015-163404-ladsgroup.json
- 16:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P70004 and previous config saved to /var/cache/conftool/dbconfig/20241015-162525-ladsgroup.json
- 16:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P70003 and previous config saved to /var/cache/conftool/dbconfig/20241015-162305-arnaudb.json
- 16:21 ladsgroup@cumin1002: START - Cookbook sre.mysql.clone of db2194.codfw.wmnet onto db2205.codfw.wmnet
- 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool for reclone (T375652)', diff saved to https://phabricator.wikimedia.org/P70002 and previous config saved to /var/cache/conftool/dbconfig/20241015-161934-ladsgroup.json
- 16:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P70001 and previous config saved to /var/cache/conftool/dbconfig/20241015-161858-ladsgroup.json
- 16:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T371742)', diff saved to https://phabricator.wikimedia.org/P70000 and previous config saved to /var/cache/conftool/dbconfig/20241015-161018-ladsgroup.json
- 16:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P69999 and previous config saved to /var/cache/conftool/dbconfig/20241015-160758-arnaudb.json
- 16:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P69998 and previous config saved to /var/cache/conftool/dbconfig/20241015-160351-ladsgroup.json
- 16:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool db2205 T377164', diff saved to https://phabricator.wikimedia.org/P69997 and previous config saved to /var/cache/conftool/dbconfig/20241015-160106-ladsgroup.json
- 15:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T367781)', diff saved to https://phabricator.wikimedia.org/P69996 and previous config saved to /var/cache/conftool/dbconfig/20241015-155251-arnaudb.json
- 15:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Promote db2209 to s3 primary and set section read-write T377164', diff saved to https://phabricator.wikimedia.org/P69995 and previous config saved to /var/cache/conftool/dbconfig/20241015-155240-ladsgroup.json
- 15:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T376905)', diff saved to https://phabricator.wikimedia.org/P69994 and previous config saved to /var/cache/conftool/dbconfig/20241015-154844-ladsgroup.json
- 15:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Set s3 codfw as read-only for maintenance - T377164', diff saved to https://phabricator.wikimedia.org/P69993 and previous config saved to /var/cache/conftool/dbconfig/20241015-154834-ladsgroup.json
- 15:48 Amir1: Starting s3 codfw failover from db2205 to db2209 - T377164
- 15:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T367781)', diff saved to https://phabricator.wikimedia.org/P69992 and previous config saved to /var/cache/conftool/dbconfig/20241015-154318-arnaudb.json
- 15:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2176.codfw.wmnet with reason: Maintenance
- 15:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2176.codfw.wmnet with reason: Maintenance
- 15:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T367781)', diff saved to https://phabricator.wikimedia.org/P69991 and previous config saved to /var/cache/conftool/dbconfig/20241015-154256-arnaudb.json
- 15:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Set db2209 with weight 0 T377164', diff saved to https://phabricator.wikimedia.org/P69990 and previous config saved to /var/cache/conftool/dbconfig/20241015-154228-ladsgroup.json
- 15:43 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s3 T377164
- 15:42 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s3 T377164
- 15:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T376905)', diff saved to https://phabricator.wikimedia.org/P69989 and previous config saved to /var/cache/conftool/dbconfig/20241015-154027-ladsgroup.json
- 15:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
- 15:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
- 15:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T376905)', diff saved to https://phabricator.wikimedia.org/P69988 and previous config saved to /var/cache/conftool/dbconfig/20241015-154002-ladsgroup.json
- 15:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P69987 and previous config saved to /var/cache/conftool/dbconfig/20241015-152749-arnaudb.json
- 15:26 akosiaris: run gnt-cluster verify-disks after ganeti1034 forceful reboot
- 15:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P69986 and previous config saved to /var/cache/conftool/dbconfig/20241015-152456-ladsgroup.json
- 15:22 volans: force-rebooting ganeti1034 stuck due to drbd traces via mgmt
- 15:19 akosiaris@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1034.eqiad.wmnet
- 15:17 akosiaris: drain ganeti1034 of VMs, hardware might be misbehaving
- 15:16 akosiaris@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1034.eqiad.wmnet
- 15:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P69985 and previous config saved to /var/cache/conftool/dbconfig/20241015-151243-arnaudb.json
- 15:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P69984 and previous config saved to /var/cache/conftool/dbconfig/20241015-150948-ladsgroup.json
- 14:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T367781)', diff saved to https://phabricator.wikimedia.org/P69983 and previous config saved to /var/cache/conftool/dbconfig/20241015-145734-arnaudb.json
- 14:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan1001.eqiad.wmnet
- 14:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T367781)', diff saved to https://phabricator.wikimedia.org/P69982 and previous config saved to /var/cache/conftool/dbconfig/20241015-145517-arnaudb.json
- 14:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2174.codfw.wmnet with reason: Maintenance
- 14:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2174.codfw.wmnet with reason: Maintenance
- 14:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T367781)', diff saved to https://phabricator.wikimedia.org/P69981 and previous config saved to /var/cache/conftool/dbconfig/20241015-145453-arnaudb.json
- 14:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T376905)', diff saved to https://phabricator.wikimedia.org/P69980 and previous config saved to /var/cache/conftool/dbconfig/20241015-145441-ladsgroup.json
- 14:48 herron@cumin1002: START - Cookbook sre.hosts.reboot-single for host titan1001.eqiad.wmnet
- 14:47 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan2001.codfw.wmnet
- 14:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T376905)', diff saved to https://phabricator.wikimedia.org/P69979 and previous config saved to /var/cache/conftool/dbconfig/20241015-144631-ladsgroup.json
- 14:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
- 14:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
- 14:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T376905)', diff saved to https://phabricator.wikimedia.org/P69978 and previous config saved to /var/cache/conftool/dbconfig/20241015-144606-ladsgroup.json
- 14:45 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 02m 24s)
- 14:43 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 46s)
- 14:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P69977 and previous config saved to /var/cache/conftool/dbconfig/20241015-143946-arnaudb.json
- 14:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T371742)', diff saved to https://phabricator.wikimedia.org/P69976 and previous config saved to /var/cache/conftool/dbconfig/20241015-143803-ladsgroup.json
- 14:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
- 14:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
- 14:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T371742)', diff saved to https://phabricator.wikimedia.org/P69975 and previous config saved to /var/cache/conftool/dbconfig/20241015-143740-ladsgroup.json
- 14:36 herron@cumin1002: START - Cookbook sre.hosts.reboot-single for host titan2001.codfw.wmnet
- 14:35 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host matomo1003.eqiad.wmnet
- 14:33 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan1002.eqiad.wmnet
- 14:31 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host matomo1003.eqiad.wmnet
- 14:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P69974 and previous config saved to /var/cache/conftool/dbconfig/20241015-143059-ladsgroup.json
- 14:29 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 14:28 herron@cumin1002: START - Cookbook sre.hosts.reboot-single for host titan1002.eqiad.wmnet
- 14:28 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 14:27 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 14:27 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 14:26 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 14:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P69973 and previous config saved to /var/cache/conftool/dbconfig/20241015-142439-arnaudb.json
- 14:24 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan2002.codfw.wmnet
- 14:24 urbanecm@deploy2002: Finished scap sync-world: Backport for SkinComponentCopyright: Fix message existence check for history-copyright (T45646) (duration: 33m 23s)
- 14:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P69972 and previous config saved to /var/cache/conftool/dbconfig/20241015-142233-ladsgroup.json
- 14:21 btullis@cumin1002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling reboot on A:schema
- 14:19 urbanecm@deploy2002: urbanecm, matmarex: Continuing with sync
- 14:17 herron@cumin1002: START - Cookbook sre.hosts.reboot-single for host titan2002.codfw.wmnet
- 14:16 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet
- 14:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P69971 and previous config saved to /var/cache/conftool/dbconfig/20241015-141552-ladsgroup.json
- 14:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T367781)', diff saved to https://phabricator.wikimedia.org/P69970 and previous config saved to /var/cache/conftool/dbconfig/20241015-140932-arnaudb.json
- 14:09 herron@cumin1002: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
- 14:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P69969 and previous config saved to /var/cache/conftool/dbconfig/20241015-140726-ladsgroup.json
- 14:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T367781)', diff saved to https://phabricator.wikimedia.org/P69968 and previous config saved to /var/cache/conftool/dbconfig/20241015-140716-arnaudb.json
- 14:08 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 14:08 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1020.eqiad.wmnet
- 14:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 14:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2173.codfw.wmnet with reason: Maintenance
- 14:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2173.codfw.wmnet with reason: Maintenance
- 14:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T367781)', diff saved to https://phabricator.wikimedia.org/P69967 and previous config saved to /var/cache/conftool/dbconfig/20241015-140638-arnaudb.json
- 14:05 btullis@cumin1002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling reboot on A:schema
- 14:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T376905)', diff saved to https://phabricator.wikimedia.org/P69966 and previous config saved to /var/cache/conftool/dbconfig/20241015-140045-ladsgroup.json
- 14:00 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-presto1020.eqiad.wmnet
- 13:57 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1019.eqiad.wmnet
- 13:55 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog1002.eqiad.wmnet
- 13:54 urbanecm@deploy2002: urbanecm, matmarex: Backport for SkinComponentCopyright: Fix message existence check for history-copyright (T45646) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T376905)', diff saved to https://phabricator.wikimedia.org/P69965 and previous config saved to /var/cache/conftool/dbconfig/20241015-135234-ladsgroup.json
- 13:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
- 13:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
- 13:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T371742)', diff saved to https://phabricator.wikimedia.org/P69964 and previous config saved to /var/cache/conftool/dbconfig/20241015-135213-ladsgroup.json
- 13:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T376905)', diff saved to https://phabricator.wikimedia.org/P69963 and previous config saved to /var/cache/conftool/dbconfig/20241015-135208-ladsgroup.json
- 13:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P69962 and previous config saved to /var/cache/conftool/dbconfig/20241015-135131-arnaudb.json
- 13:51 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-presto1019.eqiad.wmnet
- 13:50 urbanecm@deploy2002: Started scap sync-world: Backport for SkinComponentCopyright: Fix message existence check for history-copyright (T45646)
- 13:48 herron@cumin1002: START - Cookbook sre.hosts.reboot-single for host centrallog1002.eqiad.wmnet
- 13:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P69961 and previous config saved to /var/cache/conftool/dbconfig/20241015-133701-ladsgroup.json
- 13:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P69960 and previous config saved to /var/cache/conftool/dbconfig/20241015-133624-arnaudb.json
- 13:32 urbanecm@deploy2002: Finished scap sync-world: Backport for eswiki: switch clearing link recommendations to PageSaveComplete hook (T372337), s7: Reduce revision-slots cache expiry to 60 seconds (T183490) (duration: 07m 44s)
- 13:27 urbanecm@deploy2002: migr, urbanecm, zabe: Continuing with sync
- 13:26 urbanecm@deploy2002: migr, urbanecm, zabe: Backport for eswiki: switch clearing link recommendations to PageSaveComplete hook (T372337), s7: Reduce revision-slots cache expiry to 60 seconds (T183490) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:24 urbanecm@deploy2002: Started scap sync-world: Backport for eswiki: switch clearing link recommendations to PageSaveComplete hook (T372337), s7: Reduce revision-slots cache expiry to 60 seconds (T183490)
- 13:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [wikidatawiki] Enable the CampaignEvents extension (T375411), GrowthExperiments: update stream configuration to capture user id (T376833) (duration: 19m 25s)
- 13:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P69959 and previous config saved to /var/cache/conftool/dbconfig/20241015-132154-ladsgroup.json
- 13:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T367781)', diff saved to https://phabricator.wikimedia.org/P69958 and previous config saved to /var/cache/conftool/dbconfig/20241015-132117-arnaudb.json
- 13:19 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1018.eqiad.wmnet
- 13:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T367781)', diff saved to https://phabricator.wikimedia.org/P69957 and previous config saved to /var/cache/conftool/dbconfig/20241015-131901-arnaudb.json
- 13:19 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2170.codfw.wmnet with reason: Maintenance
- 13:18 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2170.codfw.wmnet with reason: Maintenance
- 13:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T367781)', diff saved to https://phabricator.wikimedia.org/P69956 and previous config saved to /var/cache/conftool/dbconfig/20241015-131839-arnaudb.json
- 13:16 urbanecm@deploy2002: cyndywikime, daimona, urbanecm: Continuing with sync
- 13:12 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-presto1018.eqiad.wmnet
- 13:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T370903)', diff saved to https://phabricator.wikimedia.org/P69955 and previous config saved to /var/cache/conftool/dbconfig/20241015-131122-ladsgroup.json
- 13:11 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1017.eqiad.wmnet
- 13:11 urbanecm@deploy2002: cyndywikime, daimona, urbanecm: Backport for [wikidatawiki] Enable the CampaignEvents extension (T375411), GrowthExperiments: update stream configuration to capture user id (T376833) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T376905)', diff saved to https://phabricator.wikimedia.org/P69954 and previous config saved to /var/cache/conftool/dbconfig/20241015-130647-ladsgroup.json
- 13:04 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-presto1017.eqiad.wmnet
- 13:04 urbanecm@deploy2002: Started scap sync-world: Backport for [wikidatawiki] Enable the CampaignEvents extension (T375411), GrowthExperiments: update stream configuration to capture user id (T376833)
- 13:03 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1016.eqiad.wmnet
- 13:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P69953 and previous config saved to /var/cache/conftool/dbconfig/20241015-130332-arnaudb.json
- 12:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T376905)', diff saved to https://phabricator.wikimedia.org/P69952 and previous config saved to /var/cache/conftool/dbconfig/20241015-125748-ladsgroup.json
- 12:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
- 12:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
- 12:57 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-presto1016.eqiad.wmnet
- 12:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69951 and previous config saved to /var/cache/conftool/dbconfig/20241015-125615-ladsgroup.json
- 12:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
- 12:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
- 12:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T376905)', diff saved to https://phabricator.wikimedia.org/P69950 and previous config saved to /var/cache/conftool/dbconfig/20241015-125203-ladsgroup.json
- 12:50 brouberol@cumin1002: END (FAIL) - Cookbook sre.presto.reboot-workers (exit_code=99) for Presto an-presto cluster: Reboot Presto nodes
- 12:50 elukey: destroy old certs from puppetmaster1001's CA (parsoid.svc.{eqiad,codfw}.wmnet, debmonitor.discovery.wmnet)
- 12:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P69949 and previous config saved to /var/cache/conftool/dbconfig/20241015-124825-arnaudb.json
- 12:46 brouberol@cumin1002: START - Cookbook sre.presto.reboot-workers for Presto an-presto cluster: Reboot Presto nodes
- 12:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69948 and previous config saved to /var/cache/conftool/dbconfig/20241015-124108-ladsgroup.json
- 12:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P69947 and previous config saved to /var/cache/conftool/dbconfig/20241015-123656-ladsgroup.json
- 12:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T367781)', diff saved to https://phabricator.wikimedia.org/P69946 and previous config saved to /var/cache/conftool/dbconfig/20241015-123318-arnaudb.json
- 12:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T367781)', diff saved to https://phabricator.wikimedia.org/P69945 and previous config saved to /var/cache/conftool/dbconfig/20241015-123101-arnaudb.json
- 12:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2153.codfw.wmnet with reason: Maintenance
- 12:30 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2153.codfw.wmnet with reason: Maintenance
- 12:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T367781)', diff saved to https://phabricator.wikimedia.org/P69944 and previous config saved to /var/cache/conftool/dbconfig/20241015-123039-arnaudb.json
- 12:30 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 12:29 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 12:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T370903)', diff saved to https://phabricator.wikimedia.org/P69943 and previous config saved to /var/cache/conftool/dbconfig/20241015-122601-ladsgroup.json
- 12:24 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 12:24 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 12:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T370903)', diff saved to https://phabricator.wikimedia.org/P69942 and previous config saved to /var/cache/conftool/dbconfig/20241015-122251-ladsgroup.json
- 12:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1230.eqiad.wmnet with reason: Maintenance
- 12:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1230.eqiad.wmnet with reason: Maintenance
- 12:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P69941 and previous config saved to /var/cache/conftool/dbconfig/20241015-122149-ladsgroup.json
- 12:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T371742)', diff saved to https://phabricator.wikimedia.org/P69940 and previous config saved to /var/cache/conftool/dbconfig/20241015-121706-ladsgroup.json
- 12:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
- 12:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
- 12:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P69939 and previous config saved to /var/cache/conftool/dbconfig/20241015-121532-arnaudb.json
- 12:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T371742)', diff saved to https://phabricator.wikimedia.org/P69938 and previous config saved to /var/cache/conftool/dbconfig/20241015-121349-ladsgroup.json
- 12:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T376905)', diff saved to https://phabricator.wikimedia.org/P69937 and previous config saved to /var/cache/conftool/dbconfig/20241015-120642-ladsgroup.json
- 12:03 brouberol@cumin1002: END (FAIL) - Cookbook sre.presto.reboot-workers (exit_code=99) for Presto an-presto cluster: Reboot Presto nodes
- 12:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P69936 and previous config saved to /var/cache/conftool/dbconfig/20241015-120025-arnaudb.json
- 11:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69935 and previous config saved to /var/cache/conftool/dbconfig/20241015-115842-ladsgroup.json
- 11:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T376905)', diff saved to https://phabricator.wikimedia.org/P69934 and previous config saved to /var/cache/conftool/dbconfig/20241015-115630-ladsgroup.json
- 11:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
- 11:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
- 11:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T376905)', diff saved to https://phabricator.wikimedia.org/P69933 and previous config saved to /var/cache/conftool/dbconfig/20241015-115606-ladsgroup.json
- 11:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T367781)', diff saved to https://phabricator.wikimedia.org/P69932 and previous config saved to /var/cache/conftool/dbconfig/20241015-114518-arnaudb.json
- 11:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69931 and previous config saved to /var/cache/conftool/dbconfig/20241015-114336-ladsgroup.json
- 11:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T367781)', diff saved to https://phabricator.wikimedia.org/P69930 and previous config saved to /var/cache/conftool/dbconfig/20241015-114302-arnaudb.json
- 11:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2146.codfw.wmnet with reason: Maintenance
- 11:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2146.codfw.wmnet with reason: Maintenance
- 11:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T367781)', diff saved to https://phabricator.wikimedia.org/P69929 and previous config saved to /var/cache/conftool/dbconfig/20241015-114240-arnaudb.json
- 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P69927 and previous config saved to /var/cache/conftool/dbconfig/20241015-114059-ladsgroup.json
- 11:34 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 11:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T371742)', diff saved to https://phabricator.wikimedia.org/P69926 and previous config saved to /var/cache/conftool/dbconfig/20241015-112829-ladsgroup.json
- 11:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P69925 and previous config saved to /var/cache/conftool/dbconfig/20241015-112733-arnaudb.json
- 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P69924 and previous config saved to /var/cache/conftool/dbconfig/20241015-112551-ladsgroup.json
- 11:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P69923 and previous config saved to /var/cache/conftool/dbconfig/20241015-111226-arnaudb.json
- 11:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T376905)', diff saved to https://phabricator.wikimedia.org/P69922 and previous config saved to /var/cache/conftool/dbconfig/20241015-111045-ladsgroup.json
- 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T371742)', diff saved to https://phabricator.wikimedia.org/P69921 and previous config saved to /var/cache/conftool/dbconfig/20241015-110741-ladsgroup.json
- 11:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance
- 11:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance
- 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T376905)', diff saved to https://phabricator.wikimedia.org/P69920 and previous config saved to /var/cache/conftool/dbconfig/20241015-110132-ladsgroup.json
- 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
- 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
- 10:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T367781)', diff saved to https://phabricator.wikimedia.org/P69919 and previous config saved to /var/cache/conftool/dbconfig/20241015-105719-arnaudb.json
- 10:53 tappof: expand LVs on prometheus instances (k8s-mlserve and k8s-stagin) T377196
- 10:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T367781)', diff saved to https://phabricator.wikimedia.org/P69918 and previous config saved to /var/cache/conftool/dbconfig/20241015-105301-arnaudb.json
- 10:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2145.codfw.wmnet with reason: Maintenance
- 10:52 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2145.codfw.wmnet with reason: Maintenance
- 10:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2141.codfw.wmnet with reason: Maintenance
- 10:52 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2141.codfw.wmnet with reason: Maintenance
- 10:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T367781)', diff saved to https://phabricator.wikimedia.org/P69917 and previous config saved to /var/cache/conftool/dbconfig/20241015-105213-arnaudb.json
- 10:38 brouberol@cumin1002: START - Cookbook sre.presto.reboot-workers for Presto an-presto cluster: Reboot Presto nodes
- 10:38 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk2002.codfw.wmnet
- 10:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P69915 and previous config saved to /var/cache/conftool/dbconfig/20241015-103706-arnaudb.json
- 10:34 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host flink-zk2002.codfw.wmnet
- 10:30 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk2003.codfw.wmnet
- 10:26 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host flink-zk2003.codfw.wmnet
- 10:25 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk2001.codfw.wmnet
- 10:22 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host flink-zk2001.codfw.wmnet
- 10:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P69914 and previous config saved to /var/cache/conftool/dbconfig/20241015-102159-arnaudb.json
- 10:21 brouberol@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-flink-codfw cluster: Roll restart of jvm daemons.
- 10:14 brouberol@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-flink-codfw cluster: Roll restart of jvm daemons.
- 10:11 brouberol@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:dse-k8s-worker
- 10:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T367781)', diff saved to https://phabricator.wikimedia.org/P69913 and previous config saved to /var/cache/conftool/dbconfig/20241015-100652-arnaudb.json
- 10:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2130 (T367781)', diff saved to https://phabricator.wikimedia.org/P69912 and previous config saved to /var/cache/conftool/dbconfig/20241015-100435-arnaudb.json
- 10:04 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2130.codfw.wmnet with reason: Maintenance
- 10:04 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2130.codfw.wmnet with reason: Maintenance
- 10:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T367781)', diff saved to https://phabricator.wikimedia.org/P69911 and previous config saved to /var/cache/conftool/dbconfig/20241015-100413-arnaudb.json
- 09:57 brouberol@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker
- 09:55 brouberol@cumin1002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:dse-k8s-worker
- 09:52 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 09:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P69910 and previous config saved to /var/cache/conftool/dbconfig/20241015-094906-arnaudb.json
- 09:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P69909 and previous config saved to /var/cache/conftool/dbconfig/20241015-093359-arnaudb.json
- 09:26 brouberol@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker
- 09:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T367781)', diff saved to https://phabricator.wikimedia.org/P69908 and previous config saved to /var/cache/conftool/dbconfig/20241015-091852-arnaudb.json
- 09:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T367781)', diff saved to https://phabricator.wikimedia.org/P69907 and previous config saved to /var/cache/conftool/dbconfig/20241015-091635-arnaudb.json
- 09:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2116.codfw.wmnet with reason: Maintenance
- 09:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2116.codfw.wmnet with reason: Maintenance
- 09:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 09:15 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 09:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1240.eqiad.wmnet with reason: Maintenance
- 09:15 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1240.eqiad.wmnet with reason: Maintenance
- 09:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1239.eqiad.wmnet with reason: Maintenance
- 09:15 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1239.eqiad.wmnet with reason: Maintenance
- 09:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T367781)', diff saved to https://phabricator.wikimedia.org/P69906 and previous config saved to /var/cache/conftool/dbconfig/20241015-091502-arnaudb.json
- 09:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 08:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P69905 and previous config saved to /var/cache/conftool/dbconfig/20241015-085955-arnaudb.json
- 08:47 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: init - oblivian@cumin2002
- 08:46 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: init - oblivian@cumin2002
- 08:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P69903 and previous config saved to /var/cache/conftool/dbconfig/20241015-084448-arnaudb.json
- 08:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T367781)', diff saved to https://phabricator.wikimedia.org/P69902 and previous config saved to /var/cache/conftool/dbconfig/20241015-082941-arnaudb.json
- 08:27 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance
- 08:27 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 08:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T367781)', diff saved to https://phabricator.wikimedia.org/P69901 and previous config saved to /var/cache/conftool/dbconfig/20241015-082727-arnaudb.json
- 08:27 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance
- 08:27 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1235.eqiad.wmnet with reason: Maintenance
- 08:27 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1235.eqiad.wmnet with reason: Maintenance
- 08:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T367781)', diff saved to https://phabricator.wikimedia.org/P69900 and previous config saved to /var/cache/conftool/dbconfig/20241015-082704-arnaudb.json
- 08:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P69899 and previous config saved to /var/cache/conftool/dbconfig/20241015-081157-arnaudb.json
- 07:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P69898 and previous config saved to /var/cache/conftool/dbconfig/20241015-075650-arnaudb.json
- 07:48 arnaudb@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: post sunday p.age T368098', diff saved to https://phabricator.wikimedia.org/P69897 and previous config saved to /var/cache/conftool/dbconfig/20241015-074843-arnaudb.json
- 07:47 hashar: Restarted Gerrit - T373897
- 07:46 hashar@deploy2002: Finished deploy [gerrit/gerrit@2f0c927]: Gerrit to 3.10.2 on gerrit1003 - T373897 (duration: 00m 09s)
- 07:46 hashar@deploy2002: Started deploy [gerrit/gerrit@2f0c927]: Gerrit to 3.10.2 on gerrit1003 - T373897
- 07:42 hashar@deploy2002: Finished deploy [gerrit/gerrit@2f0c927]: Gerrit to 3.10.2 on gerrit2002 - T373897 (duration: 00m 07s)
- 07:42 hashar@deploy2002: Started deploy [gerrit/gerrit@2f0c927]: Gerrit to 3.10.2 on gerrit2002 - T373897
- 07:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T367781)', diff saved to https://phabricator.wikimedia.org/P69896 and previous config saved to /var/cache/conftool/dbconfig/20241015-074143-arnaudb.json
- 07:40 hashar@deploy2002: Finished deploy [gerrit/gerrit@2f0c927]: Gerrit to 3.10.2 on gerrit2003 - T373897 (duration: 00m 07s)
- 07:40 hashar@deploy2002: Started deploy [gerrit/gerrit@2f0c927]: Gerrit to 3.10.2 on gerrit2003 - T373897
- 07:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T367781)', diff saved to https://phabricator.wikimedia.org/P69895 and previous config saved to /var/cache/conftool/dbconfig/20241015-073928-arnaudb.json
- 07:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1234.eqiad.wmnet with reason: Maintenance
- 07:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1234.eqiad.wmnet with reason: Maintenance
- 07:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T367781)', diff saved to https://phabricator.wikimedia.org/P69894 and previous config saved to /var/cache/conftool/dbconfig/20241015-073906-arnaudb.json
- 07:38 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit[1003,2002-2003].wikimedia.org with reason: Gerrit 3.10.2 update
- 07:38 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit[1003,2002-2003].wikimedia.org with reason: Gerrit 3.10.2 update
- 07:35 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 07:33 arnaudb@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: post sunday p.age T368098', diff saved to https://phabricator.wikimedia.org/P69893 and previous config saved to /var/cache/conftool/dbconfig/20241015-073338-arnaudb.json
- 07:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P69892 and previous config saved to /var/cache/conftool/dbconfig/20241015-072359-arnaudb.json
- 07:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: post sunday p.age T368098', diff saved to https://phabricator.wikimedia.org/P69891 and previous config saved to /var/cache/conftool/dbconfig/20241015-071833-arnaudb.json
- 07:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P69890 and previous config saved to /var/cache/conftool/dbconfig/20241015-070852-arnaudb.json
- 07:03 arnaudb@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: post sunday p.age T368098', diff saved to https://phabricator.wikimedia.org/P69889 and previous config saved to /var/cache/conftool/dbconfig/20241015-070327-arnaudb.json
- 06:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T367781)', diff saved to https://phabricator.wikimedia.org/P69888 and previous config saved to /var/cache/conftool/dbconfig/20241015-065345-arnaudb.json
- 06:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T367781)', diff saved to https://phabricator.wikimedia.org/P69887 and previous config saved to /var/cache/conftool/dbconfig/20241015-065130-arnaudb.json
- 06:51 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1232.eqiad.wmnet with reason: Maintenance
- 06:51 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1232.eqiad.wmnet with reason: Maintenance
- 06:30 kart_: Updated MinT to 2024-10-11-113932-production
- 06:27 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
- 06:18 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
- 06:16 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
- 06:08 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
- 05:38 _joe_: restart tomcat on idp1004
- 05:35 _joe_: restart tomcat on idp2004
- 05:15 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
- 05:10 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
- 04:00 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.24 (duration: 00m 56s)
- 03:51 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.43.0-wmf.27 refs T375658 (duration: 48m 30s)
- 03:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.43.0-wmf.27 refs T375658
- 02:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 02:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 02:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T376905)', diff saved to https://phabricator.wikimedia.org/P69885 and previous config saved to /var/cache/conftool/dbconfig/20241015-024037-ladsgroup.json
- 02:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P69884 and previous config saved to /var/cache/conftool/dbconfig/20241015-022530-ladsgroup.json
- 02:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P69883 and previous config saved to /var/cache/conftool/dbconfig/20241015-021023-ladsgroup.json
- 01:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T376905)', diff saved to https://phabricator.wikimedia.org/P69882 and previous config saved to /var/cache/conftool/dbconfig/20241015-015516-ladsgroup.json
- 01:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T376905)', diff saved to https://phabricator.wikimedia.org/P69881 and previous config saved to /var/cache/conftool/dbconfig/20241015-014831-ladsgroup.json
- 01:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
- 01:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
- 01:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T376905)', diff saved to https://phabricator.wikimedia.org/P69880 and previous config saved to /var/cache/conftool/dbconfig/20241015-014803-ladsgroup.json
- 01:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P69879 and previous config saved to /var/cache/conftool/dbconfig/20241015-013257-ladsgroup.json
- 01:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P69878 and previous config saved to /var/cache/conftool/dbconfig/20241015-011749-ladsgroup.json
- 01:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T376905)', diff saved to https://phabricator.wikimedia.org/P69877 and previous config saved to /var/cache/conftool/dbconfig/20241015-010242-ladsgroup.json
- 00:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T376905)', diff saved to https://phabricator.wikimedia.org/P69876 and previous config saved to /var/cache/conftool/dbconfig/20241015-005551-ladsgroup.json
- 00:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T370903)', diff saved to https://phabricator.wikimedia.org/P69875 and previous config saved to /var/cache/conftool/dbconfig/20241015-005546-ladsgroup.json
- 00:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
- 00:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
- 00:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T376905)', diff saved to https://phabricator.wikimedia.org/P69874 and previous config saved to /var/cache/conftool/dbconfig/20241015-005525-ladsgroup.json
- 00:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P69873 and previous config saved to /var/cache/conftool/dbconfig/20241015-004039-ladsgroup.json
- 00:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P69872 and previous config saved to /var/cache/conftool/dbconfig/20241015-004018-ladsgroup.json
- 00:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P69871 and previous config saved to /var/cache/conftool/dbconfig/20241015-002531-ladsgroup.json
- 00:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P69870 and previous config saved to /var/cache/conftool/dbconfig/20241015-002511-ladsgroup.json
- 00:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T370903)', diff saved to https://phabricator.wikimedia.org/P69869 and previous config saved to /var/cache/conftool/dbconfig/20241015-001024-ladsgroup.json
- 00:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T376905)', diff saved to https://phabricator.wikimedia.org/P69868 and previous config saved to /var/cache/conftool/dbconfig/20241015-001004-ladsgroup.json
- 00:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T376905)', diff saved to https://phabricator.wikimedia.org/P69867 and previous config saved to /var/cache/conftool/dbconfig/20241015-000304-ladsgroup.json
- 00:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
- 00:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
- 00:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T376905)', diff saved to https://phabricator.wikimedia.org/P69866 and previous config saved to /var/cache/conftool/dbconfig/20241015-000236-ladsgroup.json
2024-10-14
- 23:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P69865 and previous config saved to /var/cache/conftool/dbconfig/20241014-234729-ladsgroup.json
- 23:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P69864 and previous config saved to /var/cache/conftool/dbconfig/20241014-233222-ladsgroup.json
- 23:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2140 (T370903)', diff saved to https://phabricator.wikimedia.org/P69863 and previous config saved to /var/cache/conftool/dbconfig/20241014-232857-ladsgroup.json
- 23:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2140.codfw.wmnet with reason: Maintenance
- 23:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2140.codfw.wmnet with reason: Maintenance
- 23:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T370903)', diff saved to https://phabricator.wikimedia.org/P69862 and previous config saved to /var/cache/conftool/dbconfig/20241014-232835-ladsgroup.json
- 23:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T376905)', diff saved to https://phabricator.wikimedia.org/P69861 and previous config saved to /var/cache/conftool/dbconfig/20241014-231715-ladsgroup.json
- 23:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P69860 and previous config saved to /var/cache/conftool/dbconfig/20241014-231328-ladsgroup.json
- 23:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T376905)', diff saved to https://phabricator.wikimedia.org/P69859 and previous config saved to /var/cache/conftool/dbconfig/20241014-230903-ladsgroup.json
- 23:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
- 23:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
- 23:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T376905)', diff saved to https://phabricator.wikimedia.org/P69858 and previous config saved to /var/cache/conftool/dbconfig/20241014-230838-ladsgroup.json
- 22:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P69857 and previous config saved to /var/cache/conftool/dbconfig/20241014-225818-ladsgroup.json
- 22:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T371742)', diff saved to https://phabricator.wikimedia.org/P69856 and previous config saved to /var/cache/conftool/dbconfig/20241014-225528-ladsgroup.json
- 22:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P69855 and previous config saved to /var/cache/conftool/dbconfig/20241014-225331-ladsgroup.json
- 22:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T370903)', diff saved to https://phabricator.wikimedia.org/P69854 and previous config saved to /var/cache/conftool/dbconfig/20241014-224311-ladsgroup.json
- 22:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P69853 and previous config saved to /var/cache/conftool/dbconfig/20241014-224022-ladsgroup.json
- 22:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P69852 and previous config saved to /var/cache/conftool/dbconfig/20241014-223824-ladsgroup.json
- 22:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P69851 and previous config saved to /var/cache/conftool/dbconfig/20241014-222515-ladsgroup.json
- 22:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T376905)', diff saved to https://phabricator.wikimedia.org/P69850 and previous config saved to /var/cache/conftool/dbconfig/20241014-222317-ladsgroup.json
- 22:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69849 and previous config saved to /var/cache/conftool/dbconfig/20241014-222009-ladsgroup.json
- 22:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T376905)', diff saved to https://phabricator.wikimedia.org/P69848 and previous config saved to /var/cache/conftool/dbconfig/20241014-221508-ladsgroup.json
- 22:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
- 22:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
- 22:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T376905)', diff saved to https://phabricator.wikimedia.org/P69847 and previous config saved to /var/cache/conftool/dbconfig/20241014-221443-ladsgroup.json
- 22:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T371742)', diff saved to https://phabricator.wikimedia.org/P69846 and previous config saved to /var/cache/conftool/dbconfig/20241014-221008-ladsgroup.json
- 22:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69845 and previous config saved to /var/cache/conftool/dbconfig/20241014-220504-ladsgroup.json
- 22:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T370903)', diff saved to https://phabricator.wikimedia.org/P69844 and previous config saved to /var/cache/conftool/dbconfig/20241014-220134-ladsgroup.json
- 22:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1243.eqiad.wmnet with reason: Maintenance
- 22:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1243.eqiad.wmnet with reason: Maintenance
- 21:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P69843 and previous config saved to /var/cache/conftool/dbconfig/20241014-215936-ladsgroup.json
- 21:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P69842 and previous config saved to /var/cache/conftool/dbconfig/20241014-214958-ladsgroup.json
- 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1236 (T371742)', diff saved to https://phabricator.wikimedia.org/P69841 and previous config saved to /var/cache/conftool/dbconfig/20241014-214515-ladsgroup.json
- 21:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1236.eqiad.wmnet with reason: Maintenance
- 21:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1236.eqiad.wmnet with reason: Maintenance
- 21:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P69840 and previous config saved to /var/cache/conftool/dbconfig/20241014-214429-ladsgroup.json
- 21:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T367856)', diff saved to https://phabricator.wikimedia.org/P69839 and previous config saved to /var/cache/conftool/dbconfig/20241014-213902-ladsgroup.json
- 21:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1226.eqiad.wmnet with reason: Maintenance
- 21:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1226.eqiad.wmnet with reason: Maintenance
- 21:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69838 and previous config saved to /var/cache/conftool/dbconfig/20241014-213453-ladsgroup.json
- 21:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T376905)', diff saved to https://phabricator.wikimedia.org/P69837 and previous config saved to /var/cache/conftool/dbconfig/20241014-212922-ladsgroup.json
- 21:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T376905)', diff saved to https://phabricator.wikimedia.org/P69836 and previous config saved to /var/cache/conftool/dbconfig/20241014-212001-ladsgroup.json
- 21:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
- 21:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
- 21:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T376905)', diff saved to https://phabricator.wikimedia.org/P69835 and previous config saved to /var/cache/conftool/dbconfig/20241014-211937-ladsgroup.json
- 21:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P69834 and previous config saved to /var/cache/conftool/dbconfig/20241014-210430-ladsgroup.json
- 20:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P69833 and previous config saved to /var/cache/conftool/dbconfig/20241014-204923-ladsgroup.json
- 20:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T376905)', diff saved to https://phabricator.wikimedia.org/P69832 and previous config saved to /var/cache/conftool/dbconfig/20241014-203416-ladsgroup.json
- 20:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1162 (T376905)', diff saved to https://phabricator.wikimedia.org/P69831 and previous config saved to /var/cache/conftool/dbconfig/20241014-202504-ladsgroup.json
- 20:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
- 20:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
- 20:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T376905)', diff saved to https://phabricator.wikimedia.org/P69830 and previous config saved to /var/cache/conftool/dbconfig/20241014-202439-ladsgroup.json
- 20:21 TheresNoTime: UTC late backport window done
- 20:18 samtar@deploy2002: Finished scap sync-world: Backport for Missing.php: Redirect Scots Wiktionary to Scots Wikipedia (T249648) (duration: 08m 14s)
- 20:14 samtar@deploy2002: samtar, pppery: Continuing with sync
- 20:12 samtar@deploy2002: samtar, pppery: Backport for Missing.php: Redirect Scots Wiktionary to Scots Wikipedia (T249648) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:10 samtar@deploy2002: Started scap sync-world: Backport for Missing.php: Redirect Scots Wiktionary to Scots Wikipedia (T249648)
- 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P69829 and previous config saved to /var/cache/conftool/dbconfig/20241014-200932-ladsgroup.json
- 19:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P69828 and previous config saved to /var/cache/conftool/dbconfig/20241014-195425-ladsgroup.json
- 19:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T376905)', diff saved to https://phabricator.wikimedia.org/P69827 and previous config saved to /var/cache/conftool/dbconfig/20241014-193918-ladsgroup.json
- 19:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T376905)', diff saved to https://phabricator.wikimedia.org/P69826 and previous config saved to /var/cache/conftool/dbconfig/20241014-192956-ladsgroup.json
- 19:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 19:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 19:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
- 19:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
- 18:57 aqu@deploy2002: Finished deploy [airflow-dags/analytics@a1a70ce]: Deploy last version for Refine staging [airflow-dags@a1a70ce8] (duration: 00m 29s)
- 18:57 aqu@deploy2002: Started deploy [airflow-dags/analytics@a1a70ce]: Deploy last version for Refine staging [airflow-dags@a1a70ce8]
- 18:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
- 18:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
- 18:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T376905)', diff saved to https://phabricator.wikimedia.org/P69825 and previous config saved to /var/cache/conftool/dbconfig/20241014-185225-ladsgroup.json
- 18:47 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@a1a70ce]: Deploy last fixes on Refine staging [airflow-dags@a1a70ce8] (duration: 00m 13s)
- 18:47 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@a1a70ce]: Deploy last fixes on Refine staging [airflow-dags@a1a70ce8]
- 18:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P69824 and previous config saved to /var/cache/conftool/dbconfig/20241014-183718-ladsgroup.json
- 18:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P69823 and previous config saved to /var/cache/conftool/dbconfig/20241014-182211-ladsgroup.json
- 18:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T376905)', diff saved to https://phabricator.wikimedia.org/P69822 and previous config saved to /var/cache/conftool/dbconfig/20241014-180704-ladsgroup.json
- 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T376905)', diff saved to https://phabricator.wikimedia.org/P69821 and previous config saved to /var/cache/conftool/dbconfig/20241014-170647-ladsgroup.json
- 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
- 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
- 17:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 17:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 17:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T376905)', diff saved to https://phabricator.wikimedia.org/P69820 and previous config saved to /var/cache/conftool/dbconfig/20241014-170123-ladsgroup.json
- 16:51 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: cloudvirt1063 needs maintenance T375223
- 16:50 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: cloudvirt1063 needs maintenance T375223
- 16:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P69819 and previous config saved to /var/cache/conftool/dbconfig/20241014-164616-ladsgroup.json
- 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P69818 and previous config saved to /var/cache/conftool/dbconfig/20241014-163109-ladsgroup.json
- 16:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T376905)', diff saved to https://phabricator.wikimedia.org/P69817 and previous config saved to /var/cache/conftool/dbconfig/20241014-161602-ladsgroup.json
- 16:03 sergi0: Running `sgimeno@mwmaint2002:~$ foreachwiki userOptions.php --delete --old=1 growthexperiments-tour-newimpact-discovery` (T376461)
- 15:52 aikochou@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 15:46 aikochou@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 15:16 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
- 15:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T376905)', diff saved to https://phabricator.wikimedia.org/P69816 and previous config saved to /var/cache/conftool/dbconfig/20241014-151546-ladsgroup.json
- 15:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
- 15:15 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
- 15:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
- 15:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T376905)', diff saved to https://phabricator.wikimedia.org/P69815 and previous config saved to /var/cache/conftool/dbconfig/20241014-151521-ladsgroup.json
- 15:07 elukey@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
- 15:06 elukey@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
- 15:05 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
- 15:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P69814 and previous config saved to /var/cache/conftool/dbconfig/20241014-150014-ladsgroup.json
- 14:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P69813 and previous config saved to /var/cache/conftool/dbconfig/20241014-144507-ladsgroup.json
- 14:43 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 14:43 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 14:41 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 14:41 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 14:39 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T376905)', diff saved to https://phabricator.wikimedia.org/P69812 and previous config saved to /var/cache/conftool/dbconfig/20241014-143000-ladsgroup.json
- 14:16 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts an-worker1177.eqiad.wmnet
- 14:16 stevemunene@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:16 stevemunene@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1177.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1002"
- 14:16 stevemunene@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1177.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1002"
- 14:12 stevemunene@cumin1002: START - Cookbook sre.dns.netbox
- 14:12 Lucas_WMDE: UTC afternoon backport+config window done
- 14:10 Lucas_WMDE: [untruncated duration: 06m 48s]
- 14:09 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for refactor(tests): don't use per-method coverage annotation, refactor(HomepageHooks): extract method for simpler modifyability, Clear LinkRecommendation suggestions on page save (T364341 T372337), Run fixLinkRecommendationData even when disabled in CC (T373176) (duration: 0
- 14:07 stevemunene@cumin1002: START - Cookbook sre.hosts.decommission for hosts an-worker1177.eqiad.wmnet
- 14:07 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts an-worker1176.eqiad.wmnet
- 14:07 stevemunene@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:07 stevemunene@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1176.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1002"
- 14:06 stevemunene@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1176.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1002"
- 14:04 lucaswerkmeister-wmde@deploy2002: migr, lucaswerkmeister-wmde: Continuing with sync
- 14:04 lucaswerkmeister-wmde@deploy2002: migr, lucaswerkmeister-wmde: Backport for refactor(tests): don't use per-method coverage annotation, refactor(HomepageHooks): extract method for simpler modifyability, Clear LinkRecommendation suggestions on page save (T364341 T372337), Run fixLinkRecommendationData even when disabled in CC (T373176) synced to
- 14:03 stevemunene@cumin1002: START - Cookbook sre.dns.netbox
- 14:02 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for refactor(tests): don't use per-method coverage annotation, refactor(HomepageHooks): extract method for simpler modifyability, Clear LinkRecommendation suggestions on page save (T364341 T372337), Run fixLinkRecommendationData even when disabled in CC (T373176)
- 13:58 stevemunene@cumin1002: START - Cookbook sre.hosts.decommission for hosts an-worker1176.eqiad.wmnet
- 13:46 ladsgroup@deploy2002: Finished scap sync-world: Backport for Update interwiki.php (duration: 07m 00s)
- 13:45 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@fbcf880]: T375480 (duration: 01m 07s)
- 13:44 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@fbcf880]: T375480
- 13:41 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 13:41 ladsgroup@deploy2002: ladsgroup: Backport for Update interwiki.php synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:39 ladsgroup@deploy2002: Started scap sync-world: Backport for Update interwiki.php
- 13:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aux-k8s-etcd1002.eqiad.wmnet
- 13:35 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:35 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-etcd1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
- 13:34 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-etcd1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
- 13:31 elukey@cumin1002: START - Cookbook sre.dns.netbox
- 13:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T376905)', diff saved to https://phabricator.wikimedia.org/P69811 and previous config saved to /var/cache/conftool/dbconfig/20241014-132944-ladsgroup.json
- 13:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
- 13:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
- 13:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T376905)', diff saved to https://phabricator.wikimedia.org/P69810 and previous config saved to /var/cache/conftool/dbconfig/20241014-132918-ladsgroup.json
- 13:26 elukey@cumin1002: START - Cookbook sre.hosts.decommission for hosts aux-k8s-etcd1002.eqiad.wmnet
- 13:26 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aux-k8s-etcd1001.eqiad.wmnet
- 13:26 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:26 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-etcd1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
- 13:26 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-etcd1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
- 13:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69809 and previous config saved to /var/cache/conftool/dbconfig/20241014-132409-ladsgroup.json
- 13:22 elukey@cumin1002: START - Cookbook sre.dns.netbox
- 13:18 elukey@cumin1002: START - Cookbook sre.hosts.decommission for hosts aux-k8s-etcd1001.eqiad.wmnet
- 13:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-etcd1002.eqiad.wmnet with reason: about to decom
- 13:16 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-etcd1002.eqiad.wmnet with reason: about to decom
- 13:15 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-etcd1001.eqiad.wmnet with reason: about to decom
- 13:15 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-etcd1001.eqiad.wmnet with reason: about to decom
- 13:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P69808 and previous config saved to /var/cache/conftool/dbconfig/20241014-131411-ladsgroup.json
- 13:13 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [uawikimedia] Enable the CampaignEvents extension (T376695) (duration: 10m 19s)
- 13:09 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Continuing with sync
- 13:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69807 and previous config saved to /var/cache/conftool/dbconfig/20241014-130904-ladsgroup.json
- 13:05 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Backport for [uawikimedia] Enable the CampaignEvents extension (T376695) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:03 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [uawikimedia] Enable the CampaignEvents extension (T376695)
- 12:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P69806 and previous config saved to /var/cache/conftool/dbconfig/20241014-125904-ladsgroup.json
- 12:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P69805 and previous config saved to /var/cache/conftool/dbconfig/20241014-125358-ladsgroup.json
- 12:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69804 and previous config saved to /var/cache/conftool/dbconfig/20241014-124554-arnaudb.json
- 12:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1219.eqiad.wmnet with reason: Maintenance
- 12:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1219.eqiad.wmnet with reason: Maintenance
- 12:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T367781)', diff saved to https://phabricator.wikimedia.org/P69803 and previous config saved to /var/cache/conftool/dbconfig/20241014-124532-arnaudb.json
- 12:44 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503] (duration: 00m 12s)
- 12:44 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503]
- 12:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T376905)', diff saved to https://phabricator.wikimedia.org/P69802 and previous config saved to /var/cache/conftool/dbconfig/20241014-124357-ladsgroup.json
- 12:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aux-k8s-worker1001.eqiad.wmnet
- 12:43 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:43 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-worker1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
- 12:41 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-worker1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
- 12:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69801 and previous config saved to /var/cache/conftool/dbconfig/20241014-123853-ladsgroup.json
- 12:37 elukey@cumin1002: START - Cookbook sre.dns.netbox
- 12:32 elukey@cumin1002: START - Cookbook sre.hosts.decommission for hosts aux-k8s-worker1001.eqiad.wmnet
- 12:32 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aux-k8s-ctrl1001.eqiad.wmnet
- 12:32 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:32 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-ctrl1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
- 12:32 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-ctrl1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
- 12:30 hnowlan: removed all aqsv1 service components from aqs* hosts
- 12:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P69800 and previous config saved to /var/cache/conftool/dbconfig/20241014-123025-arnaudb.json
- 12:28 elukey@cumin1002: START - Cookbook sre.dns.netbox
- 12:23 elukey@cumin1002: START - Cookbook sre.hosts.decommission for hosts aux-k8s-ctrl1001.eqiad.wmnet
- 12:22 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=aux-k8s-worker1001.eqiad.wmnet
- 12:22 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=aux-k8s-ctrl1001.eqiad.wmnet
- 12:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P69799 and previous config saved to /var/cache/conftool/dbconfig/20241014-121518-arnaudb.json
- 12:09 elukey: increase etcd k8s aux cluster from 3 -> 5 - T344230
- 12:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T367781)', diff saved to https://phabricator.wikimedia.org/P69798 and previous config saved to /var/cache/conftool/dbconfig/20241014-120011-arnaudb.json
- 11:59 aborrero@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:59 aborrero@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb2004-dev cloud-private adddress - aborrero@cumin1002"
- 11:59 aborrero@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb2004-dev cloud-private adddress - aborrero@cumin1002"
- 11:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T367781)', diff saved to https://phabricator.wikimedia.org/P69797 and previous config saved to /var/cache/conftool/dbconfig/20241014-115755-arnaudb.json
- 11:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1218.eqiad.wmnet with reason: Maintenance
- 11:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1218.eqiad.wmnet with reason: Maintenance
- 11:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T367781)', diff saved to https://phabricator.wikimedia.org/P69796 and previous config saved to /var/cache/conftool/dbconfig/20241014-115732-arnaudb.json
- 11:56 Dreamy_Jazz: Started time limited scan on enwiki for MediaModeration - https://wikitech.wikimedia.org/wiki/MediaModeration
- 11:56 aborrero@cumin1002: START - Cookbook sre.dns.netbox
- 11:52 btullis@cumin1002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
- 11:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2194.codfw.wmnet onto db2227.codfw.wmnet
- 11:50 btullis@cumin1002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
- 11:50 hnowlan@deploy2002: Finished deploy [restbase/deploy@26112d4]: Remove unused AQS components. Add bdrwiki (T371761) (duration: 15m 38s)
- 11:45 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
- 11:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1173 (T376905)', diff saved to https://phabricator.wikimedia.org/P69794 and previous config saved to /var/cache/conftool/dbconfig/20241014-114341-ladsgroup.json
- 11:43 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 11:43 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 11:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T376905)', diff saved to https://phabricator.wikimedia.org/P69793 and previous config saved to /var/cache/conftool/dbconfig/20241014-114316-ladsgroup.json
- 11:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P69792 and previous config saved to /var/cache/conftool/dbconfig/20241014-114225-arnaudb.json
- 11:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69791 and previous config saved to /var/cache/conftool/dbconfig/20241014-113941-arnaudb.json
- 11:34 hnowlan@deploy2002: Started deploy [restbase/deploy@26112d4]: Remove unused AQS components. Add bdrwiki (T371761)
- 11:31 andrewtavis-wmde@deploy2002: Finished deploy [airflow-dags/wmde@c9a2532]: (no justification provided) (duration: 00m 08s)
- 11:30 andrewtavis-wmde@deploy2002: Started deploy [airflow-dags/wmde@c9a2532]: (no justification provided)
- 11:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P69790 and previous config saved to /var/cache/conftool/dbconfig/20241014-112809-ladsgroup.json
- 11:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P69789 and previous config saved to /var/cache/conftool/dbconfig/20241014-112719-arnaudb.json
- 11:26 claime: Running ./redis-check-aof --fix on rdb1014 tcp_6379 instance - T376961
- 11:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P69788 and previous config saved to /var/cache/conftool/dbconfig/20241014-112434-arnaudb.json
- 11:16 ladsgroup@deploy2002: Finished scap sync-world: Creating bclwikisource (T377084) (duration: 06m 49s)
- 11:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P69787 and previous config saved to /var/cache/conftool/dbconfig/20241014-111302-ladsgroup.json
- 11:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T367781)', diff saved to https://phabricator.wikimedia.org/P69786 and previous config saved to /var/cache/conftool/dbconfig/20241014-111211-arnaudb.json
- 11:10 ladsgroup@deploy2002: Started scap sync-world: Creating bclwikisource (T377084)
- 11:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T367781)', diff saved to https://phabricator.wikimedia.org/P69785 and previous config saved to /var/cache/conftool/dbconfig/20241014-110956-arnaudb.json
- 11:09 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1207.eqiad.wmnet with reason: Maintenance
- 11:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1207.eqiad.wmnet with reason: Maintenance
- 11:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T367781)', diff saved to https://phabricator.wikimedia.org/P69784 and previous config saved to /var/cache/conftool/dbconfig/20241014-110933-arnaudb.json
- 11:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P69783 and previous config saved to /var/cache/conftool/dbconfig/20241014-110927-arnaudb.json
- 11:07 ladsgroup@deploy2002: Finished scap sync-world: Creating ibawiki (T376568) (duration: 06m 45s)
- 11:05 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1003.eqiad.wmnet
- 11:01 ladsgroup@deploy2002: Started scap sync-world: Creating ibawiki (T376568)
- 11:00 eoghan@cumin2002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet
- 10:58 ladsgroup@deploy2002: Finished scap sync-world: Creating annwiki (T376332) (duration: 06m 45s)
- 10:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T376905)', diff saved to https://phabricator.wikimedia.org/P69782 and previous config saved to /var/cache/conftool/dbconfig/20241014-105755-ladsgroup.json
- 10:55 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
- 10:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P69781 and previous config saved to /var/cache/conftool/dbconfig/20241014-105426-arnaudb.json
- 10:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69780 and previous config saved to /var/cache/conftool/dbconfig/20241014-105421-arnaudb.json
- 10:52 ladsgroup@deploy2002: Started scap sync-world: Creating annwiki (T376332)
- 10:51 mvernon@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
- 10:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T376905)', diff saved to https://phabricator.wikimedia.org/P69779 and previous config saved to /var/cache/conftool/dbconfig/20241014-104941-ladsgroup.json
- 10:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
- 10:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
- 10:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T376905)', diff saved to https://phabricator.wikimedia.org/P69778 and previous config saved to /var/cache/conftool/dbconfig/20241014-104916-ladsgroup.json
- 10:48 ladsgroup@deploy2002: Finished scap sync-world: Creating tddwiki (T375422) (duration: 06m 46s)
- 10:44 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert1002.wikimedia.org with reason: init - oblivian@cumin2002
- 10:44 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert1002.wikimedia.org with reason: init - oblivian@cumin2002
- 10:42 ladsgroup@deploy2002: Started scap sync-world: Creating tddwiki (T375422)
- 10:40 ladsgroup@deploy2002: Finished scap sync-world: Creating nrwiki (T375087) (duration: 06m 54s)
- 10:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P69777 and previous config saved to /var/cache/conftool/dbconfig/20241014-103919-arnaudb.json
- 10:35 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
- 10:35 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
- 10:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P69776 and previous config saved to /var/cache/conftool/dbconfig/20241014-103410-ladsgroup.json
- 10:33 ladsgroup@deploy2002: Started scap sync-world: Creating nrwiki (T375087)
- 10:31 ladsgroup@deploy2002: Finished scap sync-world: Backport for Add namespace translations for Tai Nüa (tdd) (T375421) (duration: 06m 45s)
- 10:27 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 10:27 ladsgroup@deploy2002: ladsgroup: Backport for Add namespace translations for Tai Nüa (tdd) (T375421) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 10:25 ladsgroup@deploy2002: Started scap sync-world: Backport for Add namespace translations for Tai Nüa (tdd) (T375421)
- 10:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T367781)', diff saved to https://phabricator.wikimedia.org/P69775 and previous config saved to /var/cache/conftool/dbconfig/20241014-102412-arnaudb.json
- 10:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T367781)', diff saved to https://phabricator.wikimedia.org/P69774 and previous config saved to /var/cache/conftool/dbconfig/20241014-102256-arnaudb.json
- 10:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1206.eqiad.wmnet with reason: Maintenance
- 10:22 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1206.eqiad.wmnet with reason: Maintenance
- 10:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T367781)', diff saved to https://phabricator.wikimedia.org/P69773 and previous config saved to /var/cache/conftool/dbconfig/20241014-102234-arnaudb.json
- 10:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P69772 and previous config saved to /var/cache/conftool/dbconfig/20241014-101903-ladsgroup.json
- 10:17 ladsgroup@cumin1002: START - Cookbook sre.mysql.clone of db2194.codfw.wmnet onto db2227.codfw.wmnet
- 10:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling for reclone (T375652)', diff saved to https://phabricator.wikimedia.org/P69771 and previous config saved to /var/cache/conftool/dbconfig/20241014-101354-ladsgroup.json
- 10:13 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lists1004.wikimedia.org
- 10:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling for reclone (T375652)', diff saved to https://phabricator.wikimedia.org/P69770 and previous config saved to /var/cache/conftool/dbconfig/20241014-101246-ladsgroup.json
- 10:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P69769 and previous config saved to /var/cache/conftool/dbconfig/20241014-100727-arnaudb.json
- 10:06 eoghan@cumin2002: START - Cookbook sre.hosts.reboot-single for host lists1004.wikimedia.org
- 10:06 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lists2001.wikimedia.org
- 10:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T376905)', diff saved to https://phabricator.wikimedia.org/P69768 and previous config saved to /var/cache/conftool/dbconfig/20241014-100356-ladsgroup.json
- 10:00 akosiaris: powercycle rdb1014 T376961
- 10:00 eoghan@cumin2002: START - Cookbook sre.hosts.reboot-single for host lists2001.wikimedia.org
- 10:00 oblivian@cumin2002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
- 10:00 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
- 10:00 ladsgroup@deploy2002: Finished scap sync-world: Creating rskwiki (T374963) (duration: 18m 38s)
- 09:59 oblivian@cumin2002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
- 09:59 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
- 09:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69767 and previous config saved to /var/cache/conftool/dbconfig/20241014-095354-arnaudb.json
- 09:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
- 09:53 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
- 09:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P69766 and previous config saved to /var/cache/conftool/dbconfig/20241014-095331-arnaudb.json
- 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P69765 and previous config saved to /var/cache/conftool/dbconfig/20241014-095220-arnaudb.json
- 09:41 ladsgroup@deploy2002: Started scap sync-world: Creating rskwiki (T374963)
- 09:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P69764 and previous config saved to /var/cache/conftool/dbconfig/20241014-093824-arnaudb.json
- 09:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T367781)', diff saved to https://phabricator.wikimedia.org/P69763 and previous config saved to /var/cache/conftool/dbconfig/20241014-093713-arnaudb.json
- 09:36 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 09:35 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T367781)', diff saved to https://phabricator.wikimedia.org/P69762 and previous config saved to /var/cache/conftool/dbconfig/20241014-093459-arnaudb.json
- 09:34 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 09:34 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 09:34 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1196.eqiad.wmnet with reason: Maintenance
- 09:34 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1196.eqiad.wmnet with reason: Maintenance
- 09:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T367781)', diff saved to https://phabricator.wikimedia.org/P69761 and previous config saved to /var/cache/conftool/dbconfig/20241014-093418-arnaudb.json
- 09:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P69760 and previous config saved to /var/cache/conftool/dbconfig/20241014-092317-arnaudb.json
- 09:21 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 09:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P69759 and previous config saved to /var/cache/conftool/dbconfig/20241014-091911-arnaudb.json
- 09:09 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
- 09:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P69758 and previous config saved to /var/cache/conftool/dbconfig/20241014-090810-arnaudb.json
- 09:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P69757 and previous config saved to /var/cache/conftool/dbconfig/20241014-090403-arnaudb.json
- 09:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T376905)', diff saved to https://phabricator.wikimedia.org/P69756 and previous config saved to /var/cache/conftool/dbconfig/20241014-090340-ladsgroup.json
- 09:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 09:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 09:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
- 09:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
- 09:01 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2005.codfw.wmnet
- 08:58 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
- 08:55 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2005.codfw.wmnet
- 08:55 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2004.codfw.wmnet
- 08:49 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2004.codfw.wmnet
- 08:49 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2003.codfw.wmnet
- 08:49 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
- 08:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T367781)', diff saved to https://phabricator.wikimedia.org/P69755 and previous config saved to /var/cache/conftool/dbconfig/20241014-084856-arnaudb.json
- 08:48 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
- 08:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1195 (T367781)', diff saved to https://phabricator.wikimedia.org/P69754 and previous config saved to /var/cache/conftool/dbconfig/20241014-084643-arnaudb.json
- 08:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1195.eqiad.wmnet with reason: Maintenance
- 08:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1195.eqiad.wmnet with reason: Maintenance
- 08:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T367781)', diff saved to https://phabricator.wikimedia.org/P69753 and previous config saved to /var/cache/conftool/dbconfig/20241014-084620-arnaudb.json
- 08:43 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2003.codfw.wmnet
- 08:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1029.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 08:40 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy1029.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 08:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P69752 and previous config saved to /var/cache/conftool/dbconfig/20241014-083113-arnaudb.json
- 08:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P69751 and previous config saved to /var/cache/conftool/dbconfig/20241014-081606-arnaudb.json
- 08:13 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2003.codfw.wmnet
- 08:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:12 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:11 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:11 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:10 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:10 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2004.codfw.wmnet
- 08:08 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2003.codfw.wmnet
- 08:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P69750 and previous config saved to /var/cache/conftool/dbconfig/20241014-080744-arnaudb.json
- 08:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
- 08:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
- 08:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P69749 and previous config saved to /var/cache/conftool/dbconfig/20241014-080721-arnaudb.json
- 08:07 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2005.codfw.wmnet
- 08:02 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2004.codfw.wmnet
- 08:01 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2005.codfw.wmnet
- 08:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T367781)', diff saved to https://phabricator.wikimedia.org/P69748 and previous config saved to /var/cache/conftool/dbconfig/20241014-080059-arnaudb.json
- 08:00 jayme@cumin1002: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM kubestagemaster2005.codfw.wmnet
- 08:00 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2005.codfw.wmnet
- 07:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T367781)', diff saved to https://phabricator.wikimedia.org/P69747 and previous config saved to /var/cache/conftool/dbconfig/20241014-075845-arnaudb.json
- 07:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1186.eqiad.wmnet with reason: Maintenance
- 07:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1186.eqiad.wmnet with reason: Maintenance
- 07:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T367781)', diff saved to https://phabricator.wikimedia.org/P69746 and previous config saved to /var/cache/conftool/dbconfig/20241014-075823-arnaudb.json
- 07:52 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 07:52 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 07:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P69745 and previous config saved to /var/cache/conftool/dbconfig/20241014-075214-arnaudb.json
- 07:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P69744 and previous config saved to /var/cache/conftool/dbconfig/20241014-074317-arnaudb.json
- 07:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P69743 and previous config saved to /var/cache/conftool/dbconfig/20241014-073707-arnaudb.json
- 07:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P69742 and previous config saved to /var/cache/conftool/dbconfig/20241014-072810-arnaudb.json
- 07:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P69741 and previous config saved to /var/cache/conftool/dbconfig/20241014-072201-arnaudb.json
- 07:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T367781)', diff saved to https://phabricator.wikimedia.org/P69740 and previous config saved to /var/cache/conftool/dbconfig/20241014-071302-arnaudb.json
- 07:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1184 (T367781)', diff saved to https://phabricator.wikimedia.org/P69739 and previous config saved to /var/cache/conftool/dbconfig/20241014-071048-arnaudb.json
- 07:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1184.eqiad.wmnet with reason: Maintenance
- 07:10 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1184.eqiad.wmnet with reason: Maintenance
- 07:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T367781)', diff saved to https://phabricator.wikimedia.org/P69738 and previous config saved to /var/cache/conftool/dbconfig/20241014-071026-arnaudb.json
- 06:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P69737 and previous config saved to /var/cache/conftool/dbconfig/20241014-065519-arnaudb.json
- 06:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P69736 and previous config saved to /var/cache/conftool/dbconfig/20241014-064012-arnaudb.json
- 06:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T367781)', diff saved to https://phabricator.wikimedia.org/P69735 and previous config saved to /var/cache/conftool/dbconfig/20241014-062505-arnaudb.json
- 06:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T367781)', diff saved to https://phabricator.wikimedia.org/P69734 and previous config saved to /var/cache/conftool/dbconfig/20241014-062249-arnaudb.json
- 06:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1169.eqiad.wmnet with reason: Maintenance
- 06:22 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1169.eqiad.wmnet with reason: Maintenance
- 06:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P69733 and previous config saved to /var/cache/conftool/dbconfig/20241014-062135-arnaudb.json
- 06:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 06:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 06:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 06:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 06:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2147.codfw.wmnet with reason: Maintenance
- 06:20 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2147.codfw.wmnet with reason: Maintenance
- 04:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 04:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 04:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
- 04:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
- 04:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T376905)', diff saved to https://phabricator.wikimedia.org/P69732 and previous config saved to /var/cache/conftool/dbconfig/20241014-042443-ladsgroup.json
- 04:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69731 and previous config saved to /var/cache/conftool/dbconfig/20241014-040936-ladsgroup.json
- 03:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69730 and previous config saved to /var/cache/conftool/dbconfig/20241014-035429-ladsgroup.json
- 03:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T376905)', diff saved to https://phabricator.wikimedia.org/P69729 and previous config saved to /var/cache/conftool/dbconfig/20241014-033922-ladsgroup.json
- 03:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T376905)', diff saved to https://phabricator.wikimedia.org/P69728 and previous config saved to /var/cache/conftool/dbconfig/20241014-033237-ladsgroup.json
- 03:32 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
- 03:32 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
- 03:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 03:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 03:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T376905)', diff saved to https://phabricator.wikimedia.org/P69727 and previous config saved to /var/cache/conftool/dbconfig/20241014-032710-ladsgroup.json
- 03:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P69726 and previous config saved to /var/cache/conftool/dbconfig/20241014-031203-ladsgroup.json
- 02:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P69725 and previous config saved to /var/cache/conftool/dbconfig/20241014-025656-ladsgroup.json
- 02:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T376905)', diff saved to https://phabricator.wikimedia.org/P69724 and previous config saved to /var/cache/conftool/dbconfig/20241014-024149-ladsgroup.json
- 02:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1213 (T376905)', diff saved to https://phabricator.wikimedia.org/P69723 and previous config saved to /var/cache/conftool/dbconfig/20241014-023616-ladsgroup.json
- 02:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
- 02:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
- 02:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T376905)', diff saved to https://phabricator.wikimedia.org/P69722 and previous config saved to /var/cache/conftool/dbconfig/20241014-023551-ladsgroup.json
- 02:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P69721 and previous config saved to /var/cache/conftool/dbconfig/20241014-022044-ladsgroup.json
- 02:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P69720 and previous config saved to /var/cache/conftool/dbconfig/20241014-020537-ladsgroup.json
- 01:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T376905)', diff saved to https://phabricator.wikimedia.org/P69719 and previous config saved to /var/cache/conftool/dbconfig/20241014-015030-ladsgroup.json
- 01:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1210 (T376905)', diff saved to https://phabricator.wikimedia.org/P69718 and previous config saved to /var/cache/conftool/dbconfig/20241014-014435-ladsgroup.json
- 01:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
- 01:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
- 01:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T376905)', diff saved to https://phabricator.wikimedia.org/P69717 and previous config saved to /var/cache/conftool/dbconfig/20241014-014410-ladsgroup.json
- 01:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P69716 and previous config saved to /var/cache/conftool/dbconfig/20241014-012903-ladsgroup.json
- 01:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P69715 and previous config saved to /var/cache/conftool/dbconfig/20241014-011356-ladsgroup.json
- 00:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T376905)', diff saved to https://phabricator.wikimedia.org/P69714 and previous config saved to /var/cache/conftool/dbconfig/20241014-005849-ladsgroup.json
- 00:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T376905)', diff saved to https://phabricator.wikimedia.org/P69713 and previous config saved to /var/cache/conftool/dbconfig/20241014-005056-ladsgroup.json
- 00:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
- 00:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
- 00:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T376905)', diff saved to https://phabricator.wikimedia.org/P69712 and previous config saved to /var/cache/conftool/dbconfig/20241014-005042-ladsgroup.json
- 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P69711 and previous config saved to /var/cache/conftool/dbconfig/20241014-003534-ladsgroup.json
- 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P69710 and previous config saved to /var/cache/conftool/dbconfig/20241014-002027-ladsgroup.json
- 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T376905)', diff saved to https://phabricator.wikimedia.org/P69709 and previous config saved to /var/cache/conftool/dbconfig/20241014-000520-ladsgroup.json
2024-10-13
- 23:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T376905)', diff saved to https://phabricator.wikimedia.org/P69708 and previous config saved to /var/cache/conftool/dbconfig/20241013-235726-ladsgroup.json
- 23:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
- 23:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
- 23:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T376905)', diff saved to https://phabricator.wikimedia.org/P69707 and previous config saved to /var/cache/conftool/dbconfig/20241013-235701-ladsgroup.json
- 23:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P69706 and previous config saved to /var/cache/conftool/dbconfig/20241013-234154-ladsgroup.json
- 23:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P69705 and previous config saved to /var/cache/conftool/dbconfig/20241013-232647-ladsgroup.json
- 23:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T376905)', diff saved to https://phabricator.wikimedia.org/P69704 and previous config saved to /var/cache/conftool/dbconfig/20241013-231140-ladsgroup.json
- 23:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T376905)', diff saved to https://phabricator.wikimedia.org/P69703 and previous config saved to /var/cache/conftool/dbconfig/20241013-230403-ladsgroup.json
- 23:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 23:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 23:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
- 23:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
- 12:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: maintenance
- 12:12 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: maintenance
- 12:11 arnaudb@cumin1002: dbctl commit (dc=all): 'depool db2147', diff saved to https://phabricator.wikimedia.org/P69702 and previous config saved to /var/cache/conftool/dbconfig/20241013-121154-arnaudb.json
- 10:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 10:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 10:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T367856)', diff saved to https://phabricator.wikimedia.org/P69701 and previous config saved to /var/cache/conftool/dbconfig/20241013-102205-ladsgroup.json
- 10:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P69700 and previous config saved to /var/cache/conftool/dbconfig/20241013-100658-ladsgroup.json
- 09:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P69699 and previous config saved to /var/cache/conftool/dbconfig/20241013-095151-ladsgroup.json
- 09:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T367856)', diff saved to https://phabricator.wikimedia.org/P69698 and previous config saved to /var/cache/conftool/dbconfig/20241013-093644-ladsgroup.json
2024-10-11
- 22:18 btullis@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on P{cephosd100[3-5]*} and (A:cephosd)
- 21:38 btullis@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on P{cephosd100[3-5]*} and (A:cephosd)
- 21:36 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1002.eqiad.wmnet
- 21:26 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host cephosd1002.eqiad.wmnet
- 21:24 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1001.eqiad.wmnet
- 21:14 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host cephosd1001.eqiad.wmnet
- 16:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
- 16:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 16:49 btullis@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on A:cephosd
- 16:40 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@c1d2914]: bump section topics to v0.16.0 (duration: 00m 42s)
- 16:39 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@c1d2914]: bump section topics to v0.16.0
- 16:38 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@c1d2914]: bump section topics to v0.16.0 (duration: 01m 06s)
- 16:38 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@c1d2914]: bump section topics to v0.16.0
- 16:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2004-dev.codfw.wmnet with reason: host reimage
- 16:34 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2004-dev.codfw.wmnet with reason: host reimage
- 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
- 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
- 16:14 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
- 16:11 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@1fb69c4]: T376456 (duration: 01m 15s)
- 16:10 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 16:10 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@1fb69c4]: T376456
- 15:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 15:40 btullis@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on A:cephosd
- 15:37 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:37 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for codfw cloudgw - cmooney@cumin1002"
- 15:37 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for codfw cloudgw - cmooney@cumin1002"
- 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 15:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 15:34 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 15:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 14:48 eevans@deploy2002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
- 14:48 eevans@deploy2002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
- 14:47 urandom: upgrading data-gateway to v1.0.10
- 14:46 eevans@deploy2002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
- 14:46 eevans@deploy2002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
- 14:39 eevans@deploy2002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
- 14:38 eevans@deploy2002: helmfile [staging] START helmfile.d/services/data-gateway: apply
- 14:31 andrewtavis-wmde@deploy2002: Finished deploy [airflow-dags/wmde@c9a2532]: (no justification provided) (duration: 00m 25s)
- 14:30 andrewtavis-wmde@deploy2002: Started deploy [airflow-dags/wmde@c9a2532]: (no justification provided)
- 13:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 100%: T376988', diff saved to https://phabricator.wikimedia.org/P69695 and previous config saved to /var/cache/conftool/dbconfig/20241011-135903-arnaudb.json
- 13:46 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 13:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 75%: T376988', diff saved to https://phabricator.wikimedia.org/P69694 and previous config saved to /var/cache/conftool/dbconfig/20241011-134357-arnaudb.json
- 13:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 50%: T376988', diff saved to https://phabricator.wikimedia.org/P69693 and previous config saved to /var/cache/conftool/dbconfig/20241011-132852-arnaudb.json
- 13:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 25%: T376988', diff saved to https://phabricator.wikimedia.org/P69692 and previous config saved to /var/cache/conftool/dbconfig/20241011-131347-arnaudb.json
- 13:13 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "renamed k8s prefixes descriptions in Netbox - ayounsi@cumin1002"
- 13:12 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "renamed k8s prefixes descriptions in Netbox - ayounsi@cumin1002"
- 13:08 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 12:58 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 10%: T376988', diff saved to https://phabricator.wikimedia.org/P69691 and previous config saved to /var/cache/conftool/dbconfig/20241011-125841-arnaudb.json
- 12:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 5%: T376988', diff saved to https://phabricator.wikimedia.org/P69690 and previous config saved to /var/cache/conftool/dbconfig/20241011-124336-arnaudb.json
- 12:37 hashar: Restarting Gerrit
- 12:34 akosiaris@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts scandium.eqiad.wmnet
- 12:34 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:34 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: scandium.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - akosiaris@cumin1002"
- 12:34 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: scandium.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - akosiaris@cumin1002"
- 12:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 2%: T376988', diff saved to https://phabricator.wikimedia.org/P69688 and previous config saved to /var/cache/conftool/dbconfig/20241011-122830-arnaudb.json
- 12:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 1%: T376988', diff saved to https://phabricator.wikimedia.org/P69687 and previous config saved to /var/cache/conftool/dbconfig/20241011-121325-arnaudb.json
- 11:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T367856)', diff saved to https://phabricator.wikimedia.org/P69686 and previous config saved to /var/cache/conftool/dbconfig/20241011-114446-ladsgroup.json
- 11:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1214.eqiad.wmnet with reason: Maintenance
- 11:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1214.eqiad.wmnet with reason: Maintenance
- 11:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T367856)', diff saved to https://phabricator.wikimedia.org/P69685 and previous config saved to /var/cache/conftool/dbconfig/20241011-114424-ladsgroup.json
- 11:36 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
- 11:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P69684 and previous config saved to /var/cache/conftool/dbconfig/20241011-112917-ladsgroup.json
- 11:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-worker2092.codfw.wmnet
- 11:27 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for wikikube-worker2092.codfw.wmnet
- 11:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2092.codfw.wmnet
- 11:26 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2092.codfw.wmnet
- 11:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2092.codfw.wmnet with OS bullseye
- 11:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P69683 and previous config saved to /var/cache/conftool/dbconfig/20241011-111410-ladsgroup.json
- 11:02 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test1001.eqiad.wmnet
- 10:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T367856)', diff saved to https://phabricator.wikimedia.org/P69682 and previous config saved to /var/cache/conftool/dbconfig/20241011-105903-ladsgroup.json
- 10:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2092.codfw.wmnet with reason: host reimage
- 10:57 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host acmechief-test1001.eqiad.wmnet
- 10:56 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test2001.codfw.wmnet
- 10:56 cgoubert@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 10:55 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2092.codfw.wmnet with reason: host reimage
- 10:53 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host acmechief-test2001.codfw.wmnet
- 10:50 brouberol@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on A:cephosd
- 10:50 fabfur: enabled puppet on R:acme_chief::cert for T376800
- 10:50 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
- 10:47 fabfur@cumin1002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host acmechief2002.codfw.wmnet
- 10:44 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host acmechief2002.codfw.wmnet
- 10:44 fabfur: rebooting acmechief1002|2002 (sequentially) (T376800)
- 10:37 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief1002.eqiad.wmnet
- 10:37 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host acmechief1002.eqiad.wmnet
- 10:35 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2092.codfw.wmnet with OS bullseye
- 10:34 fabfur: disabled puppet on acmechief1002 (T376800)
- 10:33 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2175.codfw.wmnet with reason: index corruption
- 10:33 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2175.codfw.wmnet with reason: index corruption
- 10:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2092.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTARTand with Dell SCP reboot policy GRACEFUL
- 10:27 jynus@cumin1002: dbctl commit (dc=all): 'depool db2175', diff saved to https://phabricator.wikimedia.org/P69680 and previous config saved to /var/cache/conftool/dbconfig/20241011-102706-jynus.json
- 10:26 fabfur: disabling puppet on R:acme_chief::cert for T376800
- 10:23 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2092.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTARTand with Dell SCP reboot policy GRACEFUL
- 09:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 (T367856)', diff saved to https://phabricator.wikimedia.org/P69678 and previous config saved to /var/cache/conftool/dbconfig/20241011-095847-ladsgroup.json
- 09:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1211.eqiad.wmnet with reason: Maintenance
- 09:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1211.eqiad.wmnet with reason: Maintenance
- 09:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T367856)', diff saved to https://phabricator.wikimedia.org/P69677 and previous config saved to /var/cache/conftool/dbconfig/20241011-095826-ladsgroup.json
- 09:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P69676 and previous config saved to /var/cache/conftool/dbconfig/20241011-094319-ladsgroup.json
- 09:41 brouberol@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on A:cephosd
- 09:38 akosiaris@cumin1002: START - Cookbook sre.hosts.decommission for hosts scandium.eqiad.wmnet
- 09:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P69675 and previous config saved to /var/cache/conftool/dbconfig/20241011-092812-ladsgroup.json
- 09:27 Dreamy_Jazz: Restarted MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
- 09:18 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye
- 09:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T367856)', diff saved to https://phabricator.wikimedia.org/P69674 and previous config saved to /var/cache/conftool/dbconfig/20241011-091305-ladsgroup.json
- 08:19 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 08:17 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 08:12 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 08:10 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1177.eqiad.wmnet with OS bullseye
- 08:10 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 08:02 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
- 08:00 moritzm: upload ircstream 0.13.0+wmf12u2 to apt.wikimedia.org (sync to latest git and the async_broadcast feature branch) T376014
- 07:59 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye
- 07:56 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1177.eqiad.wmnet with OS bullseye
- 02:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T367781)', diff saved to https://phabricator.wikimedia.org/P69673 and previous config saved to /var/cache/conftool/dbconfig/20241011-021156-arnaudb.json
- 01:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P69672 and previous config saved to /var/cache/conftool/dbconfig/20241011-015649-arnaudb.json
- 01:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P69671 and previous config saved to /var/cache/conftool/dbconfig/20241011-014142-arnaudb.json
- 01:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T367781)', diff saved to https://phabricator.wikimedia.org/P69670 and previous config saved to /var/cache/conftool/dbconfig/20241011-012635-arnaudb.json
- 01:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2237 (T367781)', diff saved to https://phabricator.wikimedia.org/P69669 and previous config saved to /var/cache/conftool/dbconfig/20241011-012424-arnaudb.json
- 01:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2237.codfw.wmnet with reason: Maintenance
- 01:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2237.codfw.wmnet with reason: Maintenance
- 01:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69668 and previous config saved to /var/cache/conftool/dbconfig/20241011-012401-arnaudb.json
- 01:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P69667 and previous config saved to /var/cache/conftool/dbconfig/20241011-010854-arnaudb.json
- 00:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P69666 and previous config saved to /var/cache/conftool/dbconfig/20241011-005347-arnaudb.json
- 00:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69665 and previous config saved to /var/cache/conftool/dbconfig/20241011-003840-arnaudb.json
2024-10-10
- 23:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69664 and previous config saved to /var/cache/conftool/dbconfig/20241010-233814-arnaudb.json
- 23:38 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
- 23:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
- 23:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T367781)', diff saved to https://phabricator.wikimedia.org/P69663 and previous config saved to /var/cache/conftool/dbconfig/20241010-233752-arnaudb.json
- 23:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P69662 and previous config saved to /var/cache/conftool/dbconfig/20241010-232245-arnaudb.json
- 23:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P69661 and previous config saved to /var/cache/conftool/dbconfig/20241010-230738-arnaudb.json
- 22:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T367781)', diff saved to https://phabricator.wikimedia.org/P69660 and previous config saved to /var/cache/conftool/dbconfig/20241010-225231-arnaudb.json
- 22:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T367781)', diff saved to https://phabricator.wikimedia.org/P69659 and previous config saved to /var/cache/conftool/dbconfig/20241010-225019-arnaudb.json
- 22:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2210.codfw.wmnet with reason: Maintenance
- 22:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2210.codfw.wmnet with reason: Maintenance
- 22:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T367781)', diff saved to https://phabricator.wikimedia.org/P69658 and previous config saved to /var/cache/conftool/dbconfig/20241010-224957-arnaudb.json
- 22:37 cstone: payments-wiki upgraded from ebb42c67 to 40e4a592
- 22:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P69657 and previous config saved to /var/cache/conftool/dbconfig/20241010-223450-arnaudb.json
- 22:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P69656 and previous config saved to /var/cache/conftool/dbconfig/20241010-221943-arnaudb.json
- 22:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T367781)', diff saved to https://phabricator.wikimedia.org/P69655 and previous config saved to /var/cache/conftool/dbconfig/20241010-220437-arnaudb.json
- 22:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T367781)', diff saved to https://phabricator.wikimedia.org/P69654 and previous config saved to /var/cache/conftool/dbconfig/20241010-220125-arnaudb.json
- 22:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2206.codfw.wmnet with reason: Maintenance
- 22:01 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2206.codfw.wmnet with reason: Maintenance
- 22:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2199.codfw.wmnet with reason: Maintenance
- 22:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2199.codfw.wmnet with reason: Maintenance
- 22:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P69653 and previous config saved to /var/cache/conftool/dbconfig/20241010-220043-arnaudb.json
- 21:52 jforrester@deploy2002: Finished deploy [integration/docroot@ff9e25a]: Add Codex PHP doc and source code link, for T375939 (duration: 00m 08s)
- 21:52 jforrester@deploy2002: Started deploy [integration/docroot@ff9e25a]: Add Codex PHP doc and source code link, for T375939
- 21:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P69652 and previous config saved to /var/cache/conftool/dbconfig/20241010-214536-arnaudb.json
- 21:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P69651 and previous config saved to /var/cache/conftool/dbconfig/20241010-213029-arnaudb.json
- 21:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P69650 and previous config saved to /var/cache/conftool/dbconfig/20241010-211522-arnaudb.json
- 21:05 aqu@deploy2002: Finished deploy [airflow-dags/analytics@c9a2532]: Webrequest-Refine fix [airflow-dags@c9a2532e] (duration: 00m 51s)
- 21:04 aqu@deploy2002: Started deploy [airflow-dags/analytics@c9a2532]: Webrequest-Refine fix [airflow-dags@c9a2532e]
- 21:04 thcipriani@deploy2002: Finished scap sync-world: Backport for Update VE core submodule to master (c98f3a542) (T376901) (duration: 08m 56s)
- 20:59 thcipriani@deploy2002: jforrester, thcipriani: Continuing with sync
- 20:57 thcipriani@deploy2002: jforrester, thcipriani: Backport for Update VE core submodule to master (c98f3a542) (T376901) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:55 thcipriani@deploy2002: Started scap sync-world: Backport for Update VE core submodule to master (c98f3a542) (T376901)
- 20:27 eileen: config revision changed from 150b02a9 to 3c6d2054
- 20:23 thcipriani@deploy2002: Finished scap sync-world: Backport for REST: Make experimental endpoints available on beta and testwiki (T375512) (duration: 08m 34s)
- 20:18 thcipriani@deploy2002: bpirkle, thcipriani: Continuing with sync
- 20:16 thcipriani@deploy2002: bpirkle, thcipriani: Backport for REST: Make experimental endpoints available on beta and testwiki (T375512) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P69649 and previous config saved to /var/cache/conftool/dbconfig/20241010-201456-arnaudb.json
- 20:14 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
- 20:14 thcipriani@deploy2002: Started scap sync-world: Backport for REST: Make experimental endpoints available on beta and testwiki (T375512)
- 20:14 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
- 20:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P69648 and previous config saved to /var/cache/conftool/dbconfig/20241010-201433-arnaudb.json
- 20:05 eileen: civicrm upgraded from 07dee21c to ff3144dd
- 19:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P69647 and previous config saved to /var/cache/conftool/dbconfig/20241010-195926-arnaudb.json
- 19:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P69646 and previous config saved to /var/cache/conftool/dbconfig/20241010-194419-arnaudb.json
- 19:43 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@4b69f50]: Stage Webrequest-Refine fix on test cluster [airflow-dags@4b69f503] (duration: 00m 13s)
- 19:43 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@4b69f50]: Stage Webrequest-Refine fix on test cluster [airflow-dags@4b69f503]
- 19:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P69645 and previous config saved to /var/cache/conftool/dbconfig/20241010-192912-arnaudb.json
- 19:23 rzl@deploy2002: Finished scap sync-world: chart version bump for 1078720 (duration: 02m 09s)
- 19:21 rzl@deploy2002: Started scap sync-world: chart version bump for 1078720
- 19:06 eileen: config revision changed from ae4a5be9 to 150b02a9
- 18:50 papaul: maintenance on mr1-eqiad complete
- 18:44 eileen: tools upgraded from 632bf430 to 62f2d170
- 18:29 eileen: tools upgraded from e9c05e30 to 632bf430
- 18:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P69644 and previous config saved to /var/cache/conftool/dbconfig/20241010-182846-arnaudb.json
- 18:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 18:28 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 18:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 18:28 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 18:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T367781)', diff saved to https://phabricator.wikimedia.org/P69643 and previous config saved to /var/cache/conftool/dbconfig/20241010-182808-arnaudb.json
- 18:14 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
- 18:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P69642 and previous config saved to /var/cache/conftool/dbconfig/20241010-181301-arnaudb.json
- 18:08 jhathaway@cumin1002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
- 18:00 papaul: ongoing maintenance on mr1-eqiad
- 17:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P69641 and previous config saved to /var/cache/conftool/dbconfig/20241010-175754-arnaudb.json
- 17:57 root@cumin1002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for dbprov1001.eqiad.wmnet: Renew puppet certificate - root@cumin1002
- 17:54 root@cumin1002: START - Cookbook sre.puppet.renew-cert for dbprov1001.eqiad.wmnet: Renew puppet certificate - root@cumin1002
- 17:47 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool echostore in eqiad: Repooling echostore after migration to service mesh - T376766
- 17:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T367781)', diff saved to https://phabricator.wikimedia.org/P69640 and previous config saved to /var/cache/conftool/dbconfig/20241010-174247-arnaudb.json
- 17:42 swfrench@cumin2002: START - Cookbook sre.discovery.service-route pool echostore in eqiad: Repooling echostore after migration to service mesh - T376766
- 17:39 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
- 17:39 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/echostore: apply
- 17:38 swfrench-wmf: removing echostore eqiad deployment (depooled) to unblock breaking change - T376766
- 17:34 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
- 17:34 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
- 17:34 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
- 17:33 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
- 17:33 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
- 17:32 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
- 17:25 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool echostore in eqiad: Depooling echostore for migration to service mesh - T376766
- 17:20 swfrench@cumin2002: START - Cookbook sre.discovery.service-route depool echostore in eqiad: Depooling echostore for migration to service mesh - T376766
- 17:04 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 17:04 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 17:04 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool echostore in codfw: Repooling echostore after migration to service mesh - T376766
- 16:59 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1001.eqiad.wmnet
- 16:58 swfrench@cumin2002: START - Cookbook sre.discovery.service-route pool echostore in codfw: Repooling echostore after migration to service mesh - T376766
- 16:53 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 16:53 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 16:53 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 16:51 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 16:51 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage1003.eqiad.wmnet
- 16:51 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage1003.eqiad.wmnet
- 16:50 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/echostore: apply
- 16:50 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/echostore: apply
- 16:49 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host cephosd1001.eqiad.wmnet
- 16:47 swfrench-wmf: removing echostore codfw deployment (depooled) to unblock breaking change - T376766
- 16:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T367781)', diff saved to https://phabricator.wikimedia.org/P69639 and previous config saved to /var/cache/conftool/dbconfig/20241010-164221-arnaudb.json
- 16:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2147.codfw.wmnet with reason: Maintenance
- 16:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2147.codfw.wmnet with reason: Maintenance
- 16:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T367781)', diff saved to https://phabricator.wikimedia.org/P69638 and previous config saved to /var/cache/conftool/dbconfig/20241010-164159-arnaudb.json
- 16:40 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1003.eqiad.wmnet with OS bookworm
- 16:30 jhathaway@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['sretest1003.eqiad.wmnet']
- 16:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P69637 and previous config saved to /var/cache/conftool/dbconfig/20241010-162652-arnaudb.json
- 16:23 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
- 16:23 jhathaway@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003.eqiad.wmnet']
- 16:21 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
- 16:18 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool echostore in codfw: Depooling echostore for migration to service mesh - T376766
- 16:13 swfrench@cumin2002: START - Cookbook sre.discovery.service-route depool echostore in codfw: Depooling echostore for migration to service mesh - T376766
- 16:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P69636 and previous config saved to /var/cache/conftool/dbconfig/20241010-161145-arnaudb.json
- 16:04 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage1003.eqiad.wmnet with OS bookworm
- 16:03 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage1003.eqiad.wmnet
- 16:02 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage1003.eqiad.wmnet
- 15:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T367781)', diff saved to https://phabricator.wikimedia.org/P69635 and previous config saved to /var/cache/conftool/dbconfig/20241010-155638-arnaudb.json
- 15:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2140 (T367781)', diff saved to https://phabricator.wikimedia.org/P69634 and previous config saved to /var/cache/conftool/dbconfig/20241010-155426-arnaudb.json
- 15:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2140.codfw.wmnet with reason: Maintenance
- 15:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2140.codfw.wmnet with reason: Maintenance
- 15:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2139.codfw.wmnet with reason: Maintenance
- 15:53 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2139.codfw.wmnet with reason: Maintenance
- 15:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T367781)', diff saved to https://phabricator.wikimedia.org/P69633 and previous config saved to /var/cache/conftool/dbconfig/20241010-155345-arnaudb.json
- 15:53 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:47 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bookworm
- 15:40 papaul: mr1-drmrs maintenance complete
- 15:39 dancy@deploy2002: Installation of scap version "4.110.0" completed for 211 hosts
- 15:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P69632 and previous config saved to /var/cache/conftool/dbconfig/20241010-153838-arnaudb.json
- 15:35 dancy@deploy2002: Installing scap version "4.110.0" for 211 hosts
- 15:33 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:28 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
- 15:25 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
- 15:23 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on A:ncredir
- 15:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P69631 and previous config saved to /var/cache/conftool/dbconfig/20241010-152331-arnaudb.json
- 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:13 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:13 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 15:13 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 15:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T367781)', diff saved to https://phabricator.wikimedia.org/P69630 and previous config saved to /var/cache/conftool/dbconfig/20241010-150824-arnaudb.json
- 15:08 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
- 15:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2136 (T367781)', diff saved to https://phabricator.wikimedia.org/P69629 and previous config saved to /var/cache/conftool/dbconfig/20241010-150512-arnaudb.json
- 15:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2136.codfw.wmnet with reason: Maintenance
- 15:04 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2136.codfw.wmnet with reason: Maintenance
- 15:04 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 15:04 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 15:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T367781)', diff saved to https://phabricator.wikimedia.org/P69628 and previous config saved to /var/cache/conftool/dbconfig/20241010-150433-arnaudb.json
- 15:02 papaul: ongoing maintenance on mr1-drmrs
- 14:56 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@4b69f50]: Revert previous staging of Refine fixes on test cluster [airflow-dags@4b69f503] (duration: 00m 13s)
- 14:56 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@4b69f50]: Revert previous staging of Refine fixes on test cluster [airflow-dags@4b69f503]
- 14:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P69626 and previous config saved to /var/cache/conftool/dbconfig/20241010-144926-arnaudb.json
- 14:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T367781)', diff saved to https://phabricator.wikimedia.org/P69625 and previous config saved to /var/cache/conftool/dbconfig/20241010-143713-arnaudb.json
- 14:34 jhathaway@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['sretest1002.eqiad.wmnet']
- 14:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P69624 and previous config saved to /var/cache/conftool/dbconfig/20241010-143419-arnaudb.json
- 14:28 jhathaway@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1002.eqiad.wmnet']
- 14:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69623 and previous config saved to /var/cache/conftool/dbconfig/20241010-142206-arnaudb.json
- 14:19 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503] (duration: 00m 13s)
- 14:19 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503]
- 14:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T367781)', diff saved to https://phabricator.wikimedia.org/P69622 and previous config saved to /var/cache/conftool/dbconfig/20241010-141912-arnaudb.json
- 14:18 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 14:18 jelto@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 14:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T367781)', diff saved to https://phabricator.wikimedia.org/P69621 and previous config saved to /var/cache/conftool/dbconfig/20241010-141704-arnaudb.json
- 14:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1249.eqiad.wmnet with reason: Maintenance
- 14:16 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 14:16 sukhe@cumin1002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on A:ncredir
- 14:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1249.eqiad.wmnet with reason: Maintenance
- 14:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T367781)', diff saved to https://phabricator.wikimedia.org/P69620 and previous config saved to /var/cache/conftool/dbconfig/20241010-141642-arnaudb.json
- 14:16 moritzm: failover Ganeti masters in magru to secondary node
- 14:12 jelto@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 14:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69619 and previous config saved to /var/cache/conftool/dbconfig/20241010-140659-arnaudb.json
- 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7004.magru.wmnet
- 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7004.magru.wmnet
- 14:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P69618 and previous config saved to /var/cache/conftool/dbconfig/20241010-140135-arnaudb.json
- 13:59 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on A:dnsbox and A:ulsfo and A:dnsbox
- 13:59 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns4004.wikimedia.org
- 13:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T367781)', diff saved to https://phabricator.wikimedia.org/P69617 and previous config saved to /var/cache/conftool/dbconfig/20241010-135152-arnaudb.json
- 13:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7004.magru.wmnet
- 13:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T367781)', diff saved to https://phabricator.wikimedia.org/P69616 and previous config saved to /var/cache/conftool/dbconfig/20241010-134926-arnaudb.json
- 13:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1230.eqiad.wmnet with reason: Maintenance
- 13:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1230.eqiad.wmnet with reason: Maintenance
- 13:48 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns4004.wikimedia.org
- 13:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P69615 and previous config saved to /var/cache/conftool/dbconfig/20241010-134628-arnaudb.json
- 13:46 Lucas_WMDE: UTC afternoon backport+config window done
- 13:45 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Use ?? instead of default value in getRawVal() (T376245) (duration: 07m 16s)
- 13:43 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7004.magru.wmnet
- 13:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7003.magru.wmnet
- 13:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7003.magru.wmnet
- 13:41 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, fomafix: Continuing with sync
- 13:41 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, fomafix: Backport for Use ?? instead of default value in getRawVal() (T376245) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:38 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Use ?? instead of default value in getRawVal() (T376245)
- 13:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Turn on mobile support for Parsoid Read Views (but not on talk pages) (T269499 T376048), Turn on Parsoid Selective Update metrics (take 2) (T371713 T376433) (duration: 16m 09s)
- 13:36 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns4003.wikimedia.org
- 13:35 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dns4003.wikimedia.org
- 13:35 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for dns4003.wikimedia.org
- 13:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7003.magru.wmnet
- 13:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, cscott: Continuing with sync
- 13:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T367781)', diff saved to https://phabricator.wikimedia.org/P69613 and previous config saved to /var/cache/conftool/dbconfig/20241010-133121-arnaudb.json
- 13:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T367781)', diff saved to https://phabricator.wikimedia.org/P69612 and previous config saved to /var/cache/conftool/dbconfig/20241010-133113-arnaudb.json
- 13:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1248.eqiad.wmnet with reason: Maintenance
- 13:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1248.eqiad.wmnet with reason: Maintenance
- 13:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T367781)', diff saved to https://phabricator.wikimedia.org/P69611 and previous config saved to /var/cache/conftool/dbconfig/20241010-133049-arnaudb.json
- 13:23 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7003.magru.wmnet
- 13:23 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, cscott: Backport for Turn on mobile support for Parsoid Read Views (but not on talk pages) (T269499 T376048), Turn on Parsoid Selective Update metrics (take 2) (T371713 T376433) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:21 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Turn on mobile support for Parsoid Read Views (but not on talk pages) (T269499 T376048), Turn on Parsoid Selective Update metrics (take 2) (T371713 T376433)
- 13:17 dreamyjazz@deploy2002: Finished scap sync-world: Backport for QuickSurvey.vue: Support using HTML in thank you message (T376517), extension.json: Add mediawiki.jqueryMsg to dependencies for ext.quicksurveys.lib (T376517) (duration: 09m 12s)
- 13:17 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns4003.wikimedia.org
- 13:17 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on A:dnsbox and A:ulsfo and A:dnsbox
- 13:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P69610 and previous config saved to /var/cache/conftool/dbconfig/20241010-131542-arnaudb.json
- 13:12 dreamyjazz@deploy2002: dreamyjazz, kharlan: Continuing with sync
- 13:11 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage1004.eqiad.wmnet
- 13:11 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage1004.eqiad.wmnet
- 13:10 dreamyjazz@deploy2002: dreamyjazz, kharlan: Backport for QuickSurvey.vue: Support using HTML in thank you message (T376517), extension.json: Add mediawiki.jqueryMsg to dependencies for ext.quicksurveys.lib (T376517) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2034.codfw.wmnet
- 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2034.codfw.wmnet
- 13:08 dreamyjazz@deploy2002: Started scap sync-world: Backport for QuickSurvey.vue: Support using HTML in thank you message (T376517), extension.json: Add mediawiki.jqueryMsg to dependencies for ext.quicksurveys.lib (T376517)
- 13:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2034.codfw.wmnet
- 13:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2034.codfw.wmnet
- 13:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4005.wikimedia.org
- 13:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
- 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2034.codfw.wmnet
- 13:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P69609 and previous config saved to /var/cache/conftool/dbconfig/20241010-130035-arnaudb.json
- 12:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
- 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2003.codfw.wmnet
- 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2034.codfw.wmnet
- 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast4005.wikimedia.org
- 12:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2003.codfw.wmnet
- 12:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3003.esams.wmnet
- 12:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow3003.esams.wmnet
- 12:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T367781)', diff saved to https://phabricator.wikimedia.org/P69608 and previous config saved to /var/cache/conftool/dbconfig/20241010-124528-arnaudb.json
- 12:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T367781)', diff saved to https://phabricator.wikimedia.org/P69607 and previous config saved to /var/cache/conftool/dbconfig/20241010-124319-arnaudb.json
- 12:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1247.eqiad.wmnet with reason: Maintenance
- 12:43 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1247.eqiad.wmnet with reason: Maintenance
- 12:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1245.eqiad.wmnet with reason: Maintenance
- 12:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1245.eqiad.wmnet with reason: Maintenance
- 12:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T367781)', diff saved to https://phabricator.wikimedia.org/P69606 and previous config saved to /var/cache/conftool/dbconfig/20241010-124241-arnaudb.json
- 12:38 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1004.eqiad.wmnet with OS bookworm
- 12:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet
- 12:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet
- 12:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P69605 and previous config saved to /var/cache/conftool/dbconfig/20241010-122734-arnaudb.json
- 12:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet
- 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet
- 12:19 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1004.eqiad.wmnet with reason: host reimage
- 12:16 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1004.eqiad.wmnet with reason: host reimage
- 12:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P69604 and previous config saved to /var/cache/conftool/dbconfig/20241010-121227-arnaudb.json
- 12:00 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage1004.eqiad.wmnet with OS bookworm
- 11:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T367781)', diff saved to https://phabricator.wikimedia.org/P69603 and previous config saved to /var/cache/conftool/dbconfig/20241010-115720-arnaudb.json
- 11:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P69599 and previous config saved to /var/cache/conftool/dbconfig/20241010-114042-arnaudb.json
- 11:34 zabe@deploy2002: Finished scap sync-world: Backport for s2: Reduce revision-slots cache expiry to 60 seconds (T183490) (duration: 06m 58s)
- 11:29 zabe@deploy2002: zabe: Continuing with sync
- 11:29 zabe@deploy2002: zabe: Backport for s2: Reduce revision-slots cache expiry to 60 seconds (T183490) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 11:27 zabe@deploy2002: Started scap sync-world: Backport for s2: Reduce revision-slots cache expiry to 60 seconds (T183490)
- 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
- 11:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P69598 and previous config saved to /var/cache/conftool/dbconfig/20241010-112535-arnaudb.json
- 11:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
- 11:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow7001.magru.wmnet
- 11:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow7001.magru.wmnet
- 11:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2008.wikimedia.org
- 11:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2008.wikimedia.org
- 11:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T367781)', diff saved to https://phabricator.wikimedia.org/P69597 and previous config saved to /var/cache/conftool/dbconfig/20241010-111028-arnaudb.json
- 11:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T367781)', diff saved to https://phabricator.wikimedia.org/P69596 and previous config saved to /var/cache/conftool/dbconfig/20241010-110920-arnaudb.json
- 11:09 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1243.eqiad.wmnet with reason: Maintenance
- 11:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1243.eqiad.wmnet with reason: Maintenance
- 11:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T367781)', diff saved to https://phabricator.wikimedia.org/P69595 and previous config saved to /var/cache/conftool/dbconfig/20241010-110857-arnaudb.json
- 11:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2007.codfw.wmnet
- 10:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2007.codfw.wmnet
- 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2006.codfw.wmnet
- 10:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P69594 and previous config saved to /var/cache/conftool/dbconfig/20241010-105350-arnaudb.json
- 10:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2006.codfw.wmnet
- 10:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2005.codfw.wmnet
- 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2005.codfw.wmnet
- 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2004.codfw.wmnet
- 10:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2004.codfw.wmnet
- 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testhost2001.codfw.wmnet
- 10:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P69593 and previous config saved to /var/cache/conftool/dbconfig/20241010-103843-arnaudb.json
- 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testhost2001.codfw.wmnet
- 10:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2003.codfw.wmnet
- 10:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T367781)', diff saved to https://phabricator.wikimedia.org/P69592 and previous config saved to /var/cache/conftool/dbconfig/20241010-102336-arnaudb.json
- 10:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki2003.codfw.wmnet
- 10:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T367781)', diff saved to https://phabricator.wikimedia.org/P69591 and previous config saved to /var/cache/conftool/dbconfig/20241010-102127-arnaudb.json
- 10:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1242.eqiad.wmnet with reason: Maintenance
- 10:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1242.eqiad.wmnet with reason: Maintenance
- 10:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T367781)', diff saved to https://phabricator.wikimedia.org/P69590 and previous config saved to /var/cache/conftool/dbconfig/20241010-102104-arnaudb.json
- 10:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P69589 and previous config saved to /var/cache/conftool/dbconfig/20241010-100557-arnaudb.json
- 09:54 jayme@cumin1002: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) depool for host kubestage1004.eqiad.wmnet
- 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt1002.wikimedia.org
- 09:52 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage1004.eqiad.wmnet
- 09:52 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage1003.eqiad.wmnet
- 09:52 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage1003.eqiad.wmnet
- 09:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P69587 and previous config saved to /var/cache/conftool/dbconfig/20241010-095050-arnaudb.json
- 09:50 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1003.eqiad.wmnet with OS bookworm
- 09:49 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
- 09:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host apt1002.wikimedia.org
- 09:35 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T367781)', diff saved to https://phabricator.wikimedia.org/P69586 and previous config saved to /var/cache/conftool/dbconfig/20241010-093544-arnaudb.json
- 09:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T367781)', diff saved to https://phabricator.wikimedia.org/P69585 and previous config saved to /var/cache/conftool/dbconfig/20241010-093335-arnaudb.json
- 09:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1241.eqiad.wmnet with reason: Maintenance
- 09:33 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1241.eqiad.wmnet with reason: Maintenance
- 09:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T367781)', diff saved to https://phabricator.wikimedia.org/P69584 and previous config saved to /var/cache/conftool/dbconfig/20241010-093313-arnaudb.json
- 09:33 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
- 09:30 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
- 09:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T367781)', diff saved to https://phabricator.wikimedia.org/P69583 and previous config saved to /var/cache/conftool/dbconfig/20241010-092735-arnaudb.json
- 09:21 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.26 refs T375657
- 09:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P69582 and previous config saved to /var/cache/conftool/dbconfig/20241010-091806-arnaudb.json
- 09:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt-staging2001.codfw.wmnet
- 09:14 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage1003.eqiad.wmnet with OS bookworm
- 09:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P69581 and previous config saved to /var/cache/conftool/dbconfig/20241010-091228-arnaudb.json
- 09:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host apt-staging2001.codfw.wmnet
- 09:10 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage1003.eqiad.wmnet
- 09:10 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage1003.eqiad.wmnet
- 09:07 aklapper@deploy2002: Finished scap sync-world: Backport for Revert "Use HTML markup instead of bidi control chars in wiki changes" (T375975 T376814) (duration: 12m 09s)
- 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt2002.wikimedia.org
- 09:03 aklapper@deploy2002: hashar, aklapper: Continuing with sync
- 09:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P69580 and previous config saved to /var/cache/conftool/dbconfig/20241010-090259-arnaudb.json
- 09:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host apt2002.wikimedia.org
- 08:57 aklapper@deploy2002: hashar, aklapper: Backport for Revert "Use HTML markup instead of bidi control chars in wiki changes" (T375975 T376814) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P69579 and previous config saved to /var/cache/conftool/dbconfig/20241010-085721-arnaudb.json
- 08:55 aklapper@deploy2002: Started scap sync-world: Backport for Revert "Use HTML markup instead of bidi control chars in wiki changes" (T375975 T376814)
- 08:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T367781)', diff saved to https://phabricator.wikimedia.org/P69578 and previous config saved to /var/cache/conftool/dbconfig/20241010-084752-arnaudb.json
- 08:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1238 (T367781)', diff saved to https://phabricator.wikimedia.org/P69577 and previous config saved to /var/cache/conftool/dbconfig/20241010-084543-arnaudb.json
- 08:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1238.eqiad.wmnet with reason: Maintenance
- 08:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1238.eqiad.wmnet with reason: Maintenance
- 08:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T367781)', diff saved to https://phabricator.wikimedia.org/P69576 and previous config saved to /var/cache/conftool/dbconfig/20241010-084521-arnaudb.json
- 08:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T367781)', diff saved to https://phabricator.wikimedia.org/P69575 and previous config saved to /var/cache/conftool/dbconfig/20241010-084214-arnaudb.json
- 08:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on cloudsw1-b1-codfw.mgmt with reason: prevent bgp alerts firing until CRs configured
- 08:41 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on cloudsw1-b1-codfw.mgmt with reason: prevent bgp alerts firing until CRs configured
- 08:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org
- 08:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1236 (T367781)', diff saved to https://phabricator.wikimedia.org/P69574 and previous config saved to /var/cache/conftool/dbconfig/20241010-084003-arnaudb.json
- 08:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1236.eqiad.wmnet with reason: Maintenance
- 08:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1236.eqiad.wmnet with reason: Maintenance
- 08:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org
- 08:33 arnaudb@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 100%: T376868', diff saved to https://phabricator.wikimedia.org/P69573 and previous config saved to /var/cache/conftool/dbconfig/20241010-083347-arnaudb.json
- 08:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P69572 and previous config saved to /var/cache/conftool/dbconfig/20241010-083013-arnaudb.json
- 08:21 brouberol@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling restart_daemons on P{cephosd1001*} and (A:cephosd)
- 08:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1002.wikimedia.org
- 08:21 brouberol@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling restart_daemons on P{cephosd1001*} and (A:cephosd)
- 08:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 75%: T376868', diff saved to https://phabricator.wikimedia.org/P69571 and previous config saved to /var/cache/conftool/dbconfig/20241010-081841-arnaudb.json
- 08:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1002.wikimedia.org
- 08:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P69570 and previous config saved to /var/cache/conftool/dbconfig/20241010-081506-arnaudb.json
- 08:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 100%: T376867', diff saved to https://phabricator.wikimedia.org/P69569 and previous config saved to /var/cache/conftool/dbconfig/20241010-080711-arnaudb.json
- 08:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1002.eqiad.wmnet
- 08:03 arnaudb@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 50%: T376868', diff saved to https://phabricator.wikimedia.org/P69568 and previous config saved to /var/cache/conftool/dbconfig/20241010-080336-arnaudb.json
- 08:02 moritzm: irc.wikimedia.org not directs to the ircstream implementation on irc1003.wikimedia.org T376014
- 08:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:02 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T367781)', diff saved to https://phabricator.wikimedia.org/P69567 and previous config saved to /var/cache/conftool/dbconfig/20241010-075959-arnaudb.json
- 07:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T367781)', diff saved to https://phabricator.wikimedia.org/P69566 and previous config saved to /var/cache/conftool/dbconfig/20241010-075951-arnaudb.json
- 07:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 07:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 07:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1221.eqiad.wmnet with reason: Maintenance
- 07:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1221.eqiad.wmnet with reason: Maintenance
- 07:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1002.eqiad.wmnet
- 07:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T367781)', diff saved to https://phabricator.wikimedia.org/P69565 and previous config saved to /var/cache/conftool/dbconfig/20241010-075911-arnaudb.json
- 07:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1003.eqiad.wmnet
- 07:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 75%: T376867', diff saved to https://phabricator.wikimedia.org/P69564 and previous config saved to /var/cache/conftool/dbconfig/20241010-075206-arnaudb.json
- 07:48 arnaudb@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 25%: T376868', diff saved to https://phabricator.wikimedia.org/P69563 and previous config saved to /var/cache/conftool/dbconfig/20241010-074831-arnaudb.json
- 07:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1003.eqiad.wmnet
- 07:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-rw2001.wikimedia.org
- 07:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P69562 and previous config saved to /var/cache/conftool/dbconfig/20241010-074404-arnaudb.json
- 07:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-rw2001.wikimedia.org
- 07:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 50%: T376867', diff saved to https://phabricator.wikimedia.org/P69561 and previous config saved to /var/cache/conftool/dbconfig/20241010-073700-arnaudb.json
- 07:34 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudidm2001-dev.codfw.wmnet
- 07:34 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 07:34 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudidm2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
- 07:33 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudidm2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
- 07:33 arnaudb@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 10%: T376868', diff saved to https://phabricator.wikimedia.org/P69560 and previous config saved to /var/cache/conftool/dbconfig/20241010-073326-arnaudb.json
- 07:33 awight: UTC morning deployments done.
- 07:32 hashar: Stopped gerrit service on gerrit2003.codfw.wmnet since it is not starting up properly | T372804
- 07:32 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 07:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 07:31 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 07:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 07:30 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
- 07:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P69559 and previous config saved to /var/cache/conftool/dbconfig/20241010-072857-arnaudb.json
- 07:28 awight@deploy2002: Finished scap sync-world: Backport for [config] Rename moved gadget name setting (T362771) (duration: 09m 22s)
- 07:25 slyngshede@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudidm2001-dev.codfw.wmnet
- 07:23 awight@deploy2002: awight, wmde-fisch: Continuing with sync
- 07:21 arnaudb@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 25%: T376867', diff saved to https://phabricator.wikimedia.org/P69558 and previous config saved to /var/cache/conftool/dbconfig/20241010-072155-arnaudb.json
- 07:21 awight@deploy2002: awight, wmde-fisch: Backport for [config] Rename moved gadget name setting (T362771) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-rw1001.wikimedia.org
- 07:18 awight@deploy2002: Started scap sync-world: Backport for [config] Rename moved gadget name setting (T362771)
- 07:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 5%: T376868', diff saved to https://phabricator.wikimedia.org/P69557 and previous config saved to /var/cache/conftool/dbconfig/20241010-071820-arnaudb.json
- 07:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1236 T376868', diff saved to https://phabricator.wikimedia.org/P69556 and previous config saved to /var/cache/conftool/dbconfig/20241010-071721-arnaudb.json
- 07:16 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 07:16 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 07:15 slyngshede@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts cloudidm2001-dev.codfw.wmnet
- 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-rw1001.wikimedia.org
- 07:15 slyngshede@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudidm2001-dev.codfw.wmnet
- 07:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote db1181 to s7 primary T376868', diff saved to https://phabricator.wikimedia.org/P69555 and previous config saved to /var/cache/conftool/dbconfig/20241010-071453-arnaudb.json
- 07:14 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 07:14 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 07:14 arnaudb: Starting s7 eqiad failover from db1236 to db1181 - T376868
- 07:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T367781)', diff saved to https://phabricator.wikimedia.org/P69554 and previous config saved to /var/cache/conftool/dbconfig/20241010-071350-arnaudb.json
- 07:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T367781)', diff saved to https://phabricator.wikimedia.org/P69553 and previous config saved to /var/cache/conftool/dbconfig/20241010-071242-arnaudb.json
- 07:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1199.eqiad.wmnet with reason: Maintenance
- 07:12 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1199.eqiad.wmnet with reason: Maintenance
- 07:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T367781)', diff saved to https://phabricator.wikimedia.org/P69552 and previous config saved to /var/cache/conftool/dbconfig/20241010-071219-arnaudb.json
- 07:08 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 07:08 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 07:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db1181 with weight 0 T376868', diff saved to https://phabricator.wikimedia.org/P69551 and previous config saved to /var/cache/conftool/dbconfig/20241010-070843-arnaudb.json
- 07:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s7 T376868
- 07:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s7 T376868
- 07:06 arnaudb@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 10%: T376867', diff saved to https://phabricator.wikimedia.org/P69550 and previous config saved to /var/cache/conftool/dbconfig/20241010-070650-arnaudb.json
- 06:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P69549 and previous config saved to /var/cache/conftool/dbconfig/20241010-065712-arnaudb.json
- 06:56 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
- 06:51 arnaudb@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 5%: T376867', diff saved to https://phabricator.wikimedia.org/P69548 and previous config saved to /var/cache/conftool/dbconfig/20241010-065145-arnaudb.json
- 06:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1230 T376867', diff saved to https://phabricator.wikimedia.org/P69547 and previous config saved to /var/cache/conftool/dbconfig/20241010-065048-arnaudb.json
- 06:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote db1183 to s5 primary T376867', diff saved to https://phabricator.wikimedia.org/P69546 and previous config saved to /var/cache/conftool/dbconfig/20241010-064827-arnaudb.json
- 06:47 arnaudb: Starting s5 eqiad failover from db1230 to db1183 - T376867
- 06:43 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 06:43 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 06:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db1183 with weight 0 T376867', diff saved to https://phabricator.wikimedia.org/P69545 and previous config saved to /var/cache/conftool/dbconfig/20241010-064219-arnaudb.json
- 06:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s5 T376867
- 06:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P69544 and previous config saved to /var/cache/conftool/dbconfig/20241010-064206-arnaudb.json
- 06:41 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s5 T376867
- 06:37 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 06:37 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 06:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T367781)', diff saved to https://phabricator.wikimedia.org/P69543 and previous config saved to /var/cache/conftool/dbconfig/20241010-062659-arnaudb.json
- 06:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T367781)', diff saved to https://phabricator.wikimedia.org/P69542 and previous config saved to /var/cache/conftool/dbconfig/20241010-062450-arnaudb.json
- 06:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1190.eqiad.wmnet with reason: Maintenance
- 06:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1190.eqiad.wmnet with reason: Maintenance
- 06:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 06:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 06:10 jelto@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
- 06:10 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
- 06:03 XioNoX: cr2-eqsin> request vmhost snapshot - T375961
- 03:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69541 and previous config saved to /var/cache/conftool/dbconfig/20241010-031553-ladsgroup.json
- 03:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69540 and previous config saved to /var/cache/conftool/dbconfig/20241010-031531-ladsgroup.json
- 03:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69539 and previous config saved to /var/cache/conftool/dbconfig/20241010-030048-ladsgroup.json
- 03:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69538 and previous config saved to /var/cache/conftool/dbconfig/20241010-030025-ladsgroup.json
- 02:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P69537 and previous config saved to /var/cache/conftool/dbconfig/20241010-024543-ladsgroup.json
- 02:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P69536 and previous config saved to /var/cache/conftool/dbconfig/20241010-024519-ladsgroup.json
- 02:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69535 and previous config saved to /var/cache/conftool/dbconfig/20241010-023037-ladsgroup.json
- 02:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69534 and previous config saved to /var/cache/conftool/dbconfig/20241010-023014-ladsgroup.json
- 02:02 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqsin [reason: repooling eqsin after cr2-eqsin replaced, T375961]
- 02:02 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqsin [reason: repooling eqsin after cr2-eqsin replaced, T375961]
- 01:50 sukhe: restart bird on doh5001 and dns5003 to resolve flapping BFD session after cr2-eqsin junos upgrade
- 01:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1198.eqiad.wmnet onto db1223.eqiad.wmnet
- 00:46 denisse@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host prometheus1006.eqiad.wmnet
- 00:41 eileen: civicrm upgraded from 3b6a7cbb to 07dee21c
- 00:27 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
- 00:26 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1007.eqiad.wmnet
- 00:19 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus1007.eqiad.wmnet
- 00:19 denisse@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host prometheus2005.codfw.wmnet
- 00:02 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
- 00:02 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
2024-10-09
- 23:52 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
- 23:51 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2007.codfw.wmnet
- 23:49 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gerrit2003.wikimedia.org
- 23:43 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus2007.codfw.wmnet
- 23:41 denisse@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host prometheus1005.eqiad.wmnet
- 23:26 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
- 23:25 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2008.codfw.wmnet
- 23:18 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus2008.codfw.wmnet
- 23:07 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
- 23:02 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
- 22:51 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1012.eqiad.wmnet with OS bookworm
- 22:51 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 22:51 ladsgroup@cumin1002: START - Cookbook sre.mysql.clone of db1198.eqiad.wmnet onto db1223.eqiad.wmnet
- 22:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool for reclone (T375652)', diff saved to https://phabricator.wikimedia.org/P69532 and previous config saved to /var/cache/conftool/dbconfig/20241009-225055-ladsgroup.json
- 22:40 eileen: civicrm upgraded from cc7c7744 to 3b6a7cbb
- 22:35 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: security release 20241009-3
- 22:30 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: security release 20241009-3
- 22:28 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: security release 20241009-3
- 22:28 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: security release 20241009-3
- 22:01 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: release 20241009-3
- 22:00 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: release 20241009-3
- 21:57 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: release 20241009-3
- 21:57 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: release 20241009-3
- 21:55 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009-2
- 21:54 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009-2
- 21:48 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009-2
- 21:47 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009-2
- 21:45 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
- 21:45 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on A:dnsbox and (A:esams or A:drmrs) and A:dnsbox
- 21:45 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns6002.wikimedia.org
- 21:44 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
- 21:44 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
- 21:44 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
- 21:42 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
- 21:42 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
- 21:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1212 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69531 and previous config saved to /var/cache/conftool/dbconfig/20241009-214117-ladsgroup.json
- 21:41 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
- 21:32 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
- 21:30 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns6002.wikimedia.org
- 21:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1212 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69530 and previous config saved to /var/cache/conftool/dbconfig/20241009-212612-ladsgroup.json
- 21:22 mutante: [apt1002:~] $ sudo -i reprepro --component thirdparty/gitlab-bullseye update bullseye-wikimedia
- 21:18 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns6001.wikimedia.org
- 21:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1212 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P69529 and previous config saved to /var/cache/conftool/dbconfig/20241009-211107-ladsgroup.json
- 21:08 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns6001.wikimedia.org
- 20:56 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns3004.wikimedia.org
- 20:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1212 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69528 and previous config saved to /var/cache/conftool/dbconfig/20241009-205601-ladsgroup.json
- 20:44 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns3004.wikimedia.org
- 20:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1198.eqiad.wmnet onto db1212.eqiad.wmnet
- 20:32 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns3003.wikimedia.org
- 20:17 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns3003.wikimedia.org
- 20:17 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on A:dnsbox and (A:esams or A:drmrs) and A:dnsbox
- 20:12 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply
- 20:12 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply
- 20:08 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns2006*} and A:dnsbox
- 20:08 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns2006.wikimedia.org
- 19:55 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns2006.wikimedia.org
- 19:55 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns2006*} and A:dnsbox
- 19:55 swfrench-wmf: removing echostore staging deployment to unblock breaking change - T376766
- 19:46 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on A:dnsbox and A:magru and A:dnsbox
- 19:46 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns7002.wikimedia.org
- 19:38 mforns@deploy2002: helmfile [codfw] DONE helmfile.d/services/commons-impact-analytics: apply
- 19:38 mforns@deploy2002: helmfile [codfw] START helmfile.d/services/commons-impact-analytics: apply
- 19:35 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns7002.wikimedia.org
- 19:35 mforns@deploy2002: helmfile [eqiad] DONE helmfile.d/services/commons-impact-analytics: apply
- 19:35 mforns@deploy2002: helmfile [eqiad] START helmfile.d/services/commons-impact-analytics: apply
- 19:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-misc2002.codfw.wmnet with OS bookworm
- 19:28 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 19:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-misc2001.codfw.wmnet with OS bookworm
- 19:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 19:27 mforns@deploy2002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
- 19:27 mforns@deploy2002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
- 19:20 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns7001.wikimedia.org
- 19:05 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns7001.wikimedia.org
- 19:04 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on A:dnsbox and A:magru and A:dnsbox
- 19:04 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on A:dnsbox and A:eqsin and A:dnsbox
- 19:04 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns5004.wikimedia.org
- 18:49 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns5004.wikimedia.org
- 18:45 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts2002.codfw.wmnet
- 18:41 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus3003.esams.wmnet
- 18:38 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts2002.codfw.wmnet
- 18:35 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus3003.esams.wmnet
- 18:34 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns5003.wikimedia.org
- 18:34 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus4002.ulsfo.wmnet
- 18:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:29 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply
- 18:29 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply
- 18:28 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus4002.ulsfo.wmnet
- 18:26 ladsgroup@cumin1002: START - Cookbook sre.mysql.clone of db1198.eqiad.wmnet onto db1212.eqiad.wmnet
- 18:26 denisse@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host prometheus5002.eqsin.wmnet
- 18:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool for reclone (T375652)', diff saved to https://phabricator.wikimedia.org/P69527 and previous config saved to /var/cache/conftool/dbconfig/20241009-182632-ladsgroup.json
- 18:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:24 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns5003.wikimedia.org
- 18:24 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on A:dnsbox and A:eqsin and A:dnsbox
- 18:19 eileen: config revision changed from 739e8794 to ae4a5be9
- 18:18 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus5002.eqsin.wmnet
- 18:16 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus6002.drmrs.wmnet
- 18:16 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus6002.drmrs.wmnet
- 18:15 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs[5004-5006].eqsin.wmnet
- 18:15 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs[5004-5006].eqsin.wmnet
- 18:15 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus7001.magru.wmnet
- 18:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-misc2002.codfw.wmnet with reason: host reimage
- 18:12 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-misc2002.codfw.wmnet with reason: host reimage
- 18:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-misc2001.codfw.wmnet with reason: host reimage
- 18:08 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus7001.magru.wmnet
- 18:06 eileen: civicrm upgraded from ae54bd5e to cc7c7744
- 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-misc2001.codfw.wmnet with reason: host reimage
- 18:01 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply
- 18:01 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply
- 17:58 zabe: zabe@mwmaint2002:~$ cat /home/zabe/s5.txt | xargs -I{} bash -c "echo {}; mwscript extensions/WikimediaMaintenance/migrateESRefToContentTable.php {} --skip /home/zabe/text_table_cleanup/{} --dump /home/zabe/text_table_dump/{} --sleep 1" # T183490
- 17:53 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-misc2002.codfw.wmnet with OS bookworm
- 17:53 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-misc2001.codfw.wmnet with OS bookworm
- 17:51 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply
- 17:51 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply
- 17:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69526 and previous config saved to /var/cache/conftool/dbconfig/20241009-174501-ladsgroup.json
- 17:44 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana2001.codfw.wmnet
- 17:41 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
- 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
- 17:40 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host grafana2001.codfw.wmnet
- 17:38 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana1002.eqiad.wmnet
- 17:34 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host grafana1002.eqiad.wmnet
- 17:31 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host alert1002.wikimedia.org
- 17:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69525 and previous config saved to /var/cache/conftool/dbconfig/20241009-172956-ladsgroup.json
- 17:23 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host alert1002.wikimedia.org
- 17:23 denisse@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host alert2002.wikimedia.org
- 17:23 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host alert2002.wikimedia.org
- 17:21 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host alert1002.wikimedia.org
- 17:13 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host alert1002.wikimedia.org
- 17:12 denisse@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host alert2002.wikimedia.org
- 17:12 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host alert2002.wikimedia.org
- 16:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69523 and previous config saved to /var/cache/conftool/dbconfig/20241009-165944-ladsgroup.json
- 16:50 kharlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
- 16:50 kharlan@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply
- 16:50 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
- 16:50 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
- 16:50 kharlan@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
- 16:50 kharlan@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
- 16:48 kharlan@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
- 16:48 kharlan@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
- 16:44 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:44 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for codfw cr IPs facin cloudsw - cmooney@cumin1002"
- 16:44 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for codfw cr IPs facin cloudsw - cmooney@cumin1002"
- 16:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1198.eqiad.wmnet onto db1157.eqiad.wmnet
- 16:34 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 16:32 bvibber: starting requeueTranscodes on old school mwmaint2002 after the k8s blowup last night
- 16:23 sukhe: running authdns-update to fix broken zone files on dns2004
- 16:23 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:23 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: picking up zone file 1.0.e.f.0.0.1.a.0.8.c.e.2.0.a.2.ip6.arpa - sukhe@cumin1002"
- 16:23 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: picking up zone file 1.0.e.f.0.0.1.a.0.8.c.e.2.0.a.2.ip6.arpa - sukhe@cumin1002"
- 16:21 sukhe: forcing commit 95858ba through sre.dns.netbox
- 16:20 sukhe@cumin1002: START - Cookbook sre.dns.netbox
- 16:07 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:05 sukhe@cumin1002: START - Cookbook sre.dns.netbox
- 16:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-misc2002.codfw.wmnet with OS bookworm
- 16:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-misc2001.codfw.wmnet with OS bookworm
- 16:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:58 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dns2005.wikimedia.org
- 15:58 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for dns2005.wikimedia.org
- 15:54 sukhe@cumin1002: END (ERROR) - Cookbook sre.dns.roll-reboot (exit_code=97) rolling reboot on A:dnsbox
- 15:53 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:53 sukhe: running authdns-update
- 15:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mc-misc2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:52 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx-in2001.wikimedia.org
- 15:49 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs[5004-5006].eqsin.wmnet with reason: site is depooled, cr2-eqsin is being replaced
- 15:49 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on lvs[5004-5006].eqsin.wmnet with reason: site is depooled, cr2-eqsin is being replaced
- 15:48 jhathaway@cumin1002: START - Cookbook sre.hosts.reboot-single for host mx-in2001.wikimedia.org
- 15:48 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx-in1001.wikimedia.org
- 15:47 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:45 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns2005.wikimedia.org
- 15:44 jhathaway@cumin1002: START - Cookbook sre.hosts.reboot-single for host mx-in1001.wikimedia.org
- 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:43 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling reboot on A:wikidough and A:wikidough
- 15:30 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns2004.wikimedia.org
- 15:26 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp.wikimedia.org on all recursors
- 15:26 sukhe@cumin1002: START - Cookbook sre.dns.wipe-cache idp.wikimedia.org on all recursors
- 15:25 fabfur: eqsin depooled for T375961
- 15:24 fabfur@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqsin [reason: eqsin cr replacement, T375961]
- 15:24 fabfur@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqsin [reason: eqsin cr replacement, T375961]
- 15:24 fabfur@cumin1002: END (FAIL) - Cookbook sre.dns.admin (exit_code=99) DNS admin: depool site eqsin [reason: eqsin cr replacementAA, T375961]
- 15:24 fabfur@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqsin [reason: eqsin cr replacementAA, T375961]
- 15:23 mutante: stewards* - rebooting machines - T351202
- 15:22 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:22 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add IPv6 reverse entry for cloudsw1-b1-codfw interface IPs - cmooney@cumin1002"
- 15:22 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add IPv6 reverse entry for cloudsw1-b1-codfw interface IPs - cmooney@cumin1002"
- 15:21 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns2004.wikimedia.org
- 15:20 sukhe: running dummy authdns-update
- 15:19 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 15:17 mutante: planet.wikimedia.org - rebooting backends
- 15:09 mutante: people.wikimedia.org - rebooting backends
- 15:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-maint2001.codfw.wmnet
- 15:07 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dns1006.wikimedia.org
- 15:07 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for dns1006.wikimedia.org
- 15:06 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns1006.wikimedia.org
- 15:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-maint2001.codfw.wmnet
- 15:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host crm2001.codfw.wmnet
- 15:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr2-eqsin with reason: router replacement
- 15:03 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cr2-eqsin with reason: router replacement
- 15:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cr2-eqsin with reason: router replacement
- 15:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cr2-eqsin with reason: router replacement
- 15:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 15:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host crm2001.codfw.wmnet
- 14:59 brouberol@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling restart_daemons on P{cephosd1001*} and (A:cephosd)
- 14:58 brouberol@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling restart_daemons on P{cephosd1001*} and (A:cephosd)
- 14:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-maint1001.eqiad.wmnet
- 14:53 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on backup[2010-2011].codfw.wmnet with reason: T376800
- 14:52 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on backup[2010-2011].codfw.wmnet with reason: T376800
- 14:51 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 14:51 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 14:51 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 14:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-maint1001.eqiad.wmnet
- 14:50 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 14:50 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns1006.wikimedia.org
- 14:47 brouberol@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling restart_daemons on P{cephosd1001*} and (A:cephosd)
- 14:47 brouberol@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling restart_daemons on P{cephosd1001*} and (A:cephosd)
- 14:47 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:47 elukey@cumin2002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 14:45 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 14:44 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 14:44 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-durum (exit_code=0) rolling reboot on A:durum
- 14:44 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:44 elukey@cumin2002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:44 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:44 elukey@cumin2002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:43 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudlb2004-dev
- 14:43 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb2004-dev
- 14:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet
- 14:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet
- 14:35 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns1005.wikimedia.org
- 14:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet
- 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:31 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
- 14:31 elukey@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: sync
- 14:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet
- 14:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 14:29 ladsgroup@cumin1002: START - Cookbook sre.mysql.clone of db1198.eqiad.wmnet onto db1157.eqiad.wmnet
- 14:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T367856)', diff saved to https://phabricator.wikimedia.org/P69522 and previous config saved to /var/cache/conftool/dbconfig/20241009-142848-ladsgroup.json
- 14:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1209.eqiad.wmnet with reason: Maintenance
- 14:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1209.eqiad.wmnet with reason: Maintenance
- 14:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T367856)', diff saved to https://phabricator.wikimedia.org/P69521 and previous config saved to /var/cache/conftool/dbconfig/20241009-142826-ladsgroup.json
- 14:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1012.eqiad.wmnet with reason: host reimage
- 14:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling for reclone (T375652)', diff saved to https://phabricator.wikimedia.org/P69520 and previous config saved to /var/cache/conftool/dbconfig/20241009-142404-ladsgroup.json
- 14:23 moritzm: failover master for ganeti/routed to ganeti2033
- 14:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudlb2004-dev
- 14:22 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb2004-dev
- 14:22 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1012.eqiad.wmnet with reason: host reimage
- 14:21 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns1005.wikimedia.org
- 14:21 sukhe: sudo cumin 'O:alerting_host' 'run-puppet-agent'
- 14:21 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2033.codfw.wmnet
- 14:21 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudlb2004-dev
- 14:21 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb2004-dev
- 14:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 14:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 14:20 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:20 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
- 14:20 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 14:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 14:18 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:18 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
- 14:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
- 14:18 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling reboot on A:wikidough and A:wikidough
- 14:18 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:17 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:14 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 14:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2033.codfw.wmnet
- 14:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P69519 and previous config saved to /var/cache/conftool/dbconfig/20241009-141319-ladsgroup.json
- 14:12 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1003.eqiad.wmnet
- 14:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2033.codfw.wmnet
- 14:11 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:11 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:11 moritzm: installing Apache security updates
- 14:10 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:09 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:09 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2033.codfw.wmnet
- 14:08 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:08 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
- 14:08 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
- 14:08 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor1003.eqiad.wmnet
- 14:07 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp2004.wikimedia.org
- 14:06 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns1004.wikimedia.org
- 14:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor2003.codfw.wmnet
- 14:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:04 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
- 14:03 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idp2004.wikimedia.org
- 14:02 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor2003.codfw.wmnet
- 14:01 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 14:01 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk1002.eqiad.wmnet
- 13:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P69517 and previous config saved to /var/cache/conftool/dbconfig/20241009-135812-ladsgroup.json
- 13:57 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host flink-zk1002.eqiad.wmnet
- 13:56 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host zookeeper-test1002.eqiad.wmnet
- 13:55 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp-test2005.wikimedia.org
- 13:54 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk1003.eqiad.wmnet
- 13:53 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 13:53 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 13:53 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns1004.wikimedia.org
- 13:52 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on A:dnsbox
- 13:52 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host zookeeper-test1002.eqiad.wmnet
- 13:51 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 13:51 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idp-test2005.wikimedia.org
- 13:51 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp-test2004.wikimedia.org
- 13:50 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host flink-zk1003.eqiad.wmnet
- 13:50 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on backup[1010-1011].eqiad.wmnet with reason: T376800
- 13:50 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on backup[1010-1011].eqiad.wmnet with reason: T376800
- 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1028.eqiad.wmnet
- 13:49 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk1001.eqiad.wmnet
- 13:48 Lucas_WMDE: UTC afternoon backport+config window done
- 13:48 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idp-test2004.wikimedia.org
- 13:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [brwikimedia] Enable the CampaignEvents extension (T376747) (duration: 07m 04s)
- 13:48 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp1004.wikimedia.org
- 13:45 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host flink-zk1001.eqiad.wmnet
- 13:45 brouberol@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host flink-zk1001.eqiad.wmnet
- 13:44 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host flink-zk1001.eqiad.wmnet
- 13:44 lucaswerkmeister-wmde@deploy2002: albertoleoncio, lucaswerkmeister-wmde: Continuing with sync
- 13:44 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idp1004.wikimedia.org
- 13:43 lucaswerkmeister-wmde@deploy2002: albertoleoncio, lucaswerkmeister-wmde: Backport for [brwikimedia] Enable the CampaignEvents extension (T376747) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:43 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp-test1004.wikimedia.org
- 13:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T367856)', diff saved to https://phabricator.wikimedia.org/P69516 and previous config saved to /var/cache/conftool/dbconfig/20241009-134305-ladsgroup.json
- 13:42 brouberol@cumin1002: END (ERROR) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=97) for Zookeeper A:zookeeper-flink-eqiad cluster: Roll restart of jvm daemons.
- 13:42 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1028.eqiad.wmnet
- 13:41 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [brwikimedia] Enable the CampaignEvents extension (T376747)
- 13:41 brouberol@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-flink-eqiad cluster: Roll restart of jvm daemons.
- 13:39 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idp-test1004.wikimedia.org
- 13:39 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-durum rolling reboot on A:durum
- 13:39 Lucas_WMDE: lucaswerkmeister-wmde@deploy2002 $ printf 'https://en.wikipedia.org/static/images/%s\n' 'project-logos/sdwiki.png' 'project-logos/sdwiki-1.5x.png' 'project-logos/sdwiki-2x.png' 'mobile/copyright/wikipedia-wordmark-sd.svg' 'mobile/copyright/wikipedia-tagline-sd.svg' | mwscript-k8s --attach -- purgeList.php # T376536
- 13:35 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for sdwiki: Add new logo and tagline (T376536) (duration: 19m 34s)
- 13:33 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm2001.wikimedia.org
- 13:32 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host gerrit2003.wikimedia.org
- 13:31 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idm2001.wikimedia.org
- 13:30 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ammarpad: Continuing with sync
- 13:30 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm1001.wikimedia.org
- 13:28 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idm1001.wikimedia.org
- 13:27 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm-test1001.wikimedia.org
- 13:23 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idm-test1001.wikimedia.org
- 13:22 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host etherpad1004.eqiad.wmnet
- 13:18 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ammarpad: Backport for sdwiki: Add new logo and tagline (T376536) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:18 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host etherpad1004.eqiad.wmnet
- 13:16 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host etherpad2002.codfw.wmnet
- 13:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for sdwiki: Add new logo and tagline (T376536)
- 13:14 kharlan@deploy2002: Finished scap sync-world: Backport for QuickSurveys: Deploy Safety Survey with zero coverage (T376517) (duration: 10m 37s)
- 13:12 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host etherpad2002.codfw.wmnet
- 13:09 kharlan@deploy2002: kharlan: Continuing with sync
- 13:06 kharlan@deploy2002: kharlan: Backport for QuickSurveys: Deploy Safety Survey with zero coverage (T376517) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:03 kharlan@deploy2002: Started scap sync-world: Backport for QuickSurveys: Deploy Safety Survey with zero coverage (T376517)
- 12:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
- 12:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rpki2002.codfw.wmnet
- 12:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rpki2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002"
- 12:41 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rpki2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002"
- 12:38 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
- 12:33 ayounsi@cumin1002: START - Cookbook sre.hosts.decommission for hosts rpki2002.codfw.wmnet
- 12:24 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 12:24 jelto@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 12:23 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 12:23 jelto@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 12:18 moritzm: installing initramfs-tools bugfix updates from Bookworm point release
- 12:16 jelto@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 12:15 jelto@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 12:15 jelto@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 12:15 jelto@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 11:54 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@b2c30ad]: T375153 (duration: 02m 32s)
- 11:52 jynus: start systemctl start wmf_auto_restart_routinator.service on rpki2003
- 11:52 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@b2c30ad]: T375153
- 11:24 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
- 11:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69513 and previous config saved to /var/cache/conftool/dbconfig/20241009-111154-ladsgroup.json
- 11:04 elukey@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: sync
- 11:00 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
- 11:00 elukey@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: sync
- 10:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69511 and previous config saved to /var/cache/conftool/dbconfig/20241009-105647-ladsgroup.json
- 10:49 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1027.eqiad.wmnet
- 10:44 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye
- 10:44 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
- 10:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P69507 and previous config saved to /var/cache/conftool/dbconfig/20241009-104142-ladsgroup.json
- 10:35 elukey@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: sync
- 10:28 elukey: roll restart swift-proxy on ms-fe* to pick up https://gerrit.wikimedia.org/r/c/operations/puppet/+/1078380
- 10:27 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1027.eqiad.wmnet
- 10:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69506 and previous config saved to /var/cache/conftool/dbconfig/20241009-102636-ladsgroup.json
- 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1026.eqiad.wmnet
- 10:11 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye
- 09:42 Dreamy_Jazz: Started time limited MediaModertation scan on enwiki for 16hrs to catchup with monthly request limit - https://wikitech.wikimedia.org/wiki/MediaModeration
- 09:40 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1026.eqiad.wmnet
- 08:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:53 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:51 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
- 08:49 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 08:49 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 08:48 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye
- 08:46 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:46 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:37 elukey@cumin2002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:37 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:23 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host cloudcephmon1005.eqiad.wmnet
- 08:18 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephmon1005.eqiad.wmnet
- 08:12 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.26 refs T375657
- 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1021.eqiad.wmnet
- 08:02 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1021.eqiad.wmnet
- 08:02 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1011.eqiad.wmnet
- 07:48 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 07:47 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1011.eqiad.wmnet
- 07:45 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 07:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 07:43 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 07:26 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 07:26 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 07:22 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 07:22 elukey@cumin2002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 07:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 07:20 elukey@cumin2002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 07:13 moritzm: remove ganeti2010 from active nodes T376594
- 06:37 eileen: civicrm upgraded from 251e958f to ae54bd5e
- 06:08 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 06:06 jelto@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 03:36 eileen: civicrm upgraded from 61718eae to 251e958f
- 01:26 eileen: tools upgraded from 3f7b238d to e9c05e30
- 00:04 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1012.eqiad.wmnet with OS bookworm
2024-10-08
- 22:36 tzatziki: removing 1 file for legal compliance
- 22:32 tzatziki: removing 3 files for legal compliance
- 22:16 tzatziki: removing 1 file for legal compliance
- 22:11 tzatziki: removing 3 files for legal compliance
- 21:59 tzatziki: removing 3 files for legal compliance
- 21:41 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on gerrit2003.wikimedia.org with reason: initial gerrit deploy wip
- 21:41 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on gerrit2003.wikimedia.org with reason: initial gerrit deploy wip
- 21:35 bvibber: running requeueTranscodes in k8s maint to clean up ios video transcodes (T363966)
- 21:34 mutante: gerrit2003 - sudo -u gerrit-deploy /usr/bin/scap deploy-local --repo gerrit/gerrit -D log_json:False (for some reason this fails in puppet but works manually) T372804 T257317 T317412
- 21:26 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
- 21:21 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1022.eqiad.wmnet with OS bullseye
- 21:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 21:16 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 21:06 eileen: config revision changed from 9ba217d2 to c84a1354
- 21:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1022.eqiad.wmnet with reason: host reimage
- 20:59 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1022.eqiad.wmnet with reason: host reimage
- 20:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host aqs1022.eqiad.wmnet with OS bullseye
- 20:54 cjming: end of UTC late backport window
- 20:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1022.eqiad.wmnet with OS bullseye
- 20:54 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host aqs1022.eqiad.wmnet with OS bullseye
- 20:54 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:52 cjming@deploy2002: Finished scap sync-world: Backport for Switch iOS back-compat video transcodes from HLS to regular QuickTime (T363966) (duration: 07m 39s)
- 20:52 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 20:48 cjming@deploy2002: bvibber, cjming: Continuing with sync
- 20:47 cjming@deploy2002: bvibber, cjming: Backport for Switch iOS back-compat video transcodes from HLS to regular QuickTime (T363966) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:45 cjming@deploy2002: Started scap sync-world: Backport for Switch iOS back-compat video transcodes from HLS to regular QuickTime (T363966)
- 20:42 cjming@deploy2002: Finished scap sync-world: Backport for Dark mode: Make LiquidThreads namespace exclusion explicit (duration: 09m 58s)
- 20:37 cjming@deploy2002: jdlrobson, cjming: Continuing with sync
- 20:34 cjming@deploy2002: jdlrobson, cjming: Backport for Dark mode: Make LiquidThreads namespace exclusion explicit synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:32 cjming@deploy2002: Started scap sync-world: Backport for Dark mode: Make LiquidThreads namespace exclusion explicit
- 20:29 cjming@deploy2002: Finished scap sync-world: Backport for Expand Vector 2022 roll out and support local variants (T375549) (duration: 19m 28s)
- 20:29 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit2003.wikimedia.org with reason: applying gerrit profile
- 20:29 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit2003.wikimedia.org with reason: applying gerrit profile
- 20:26 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gerrit2003.wikimedia.org with reason: applying gerrit profile
- 20:26 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:15:00 on gerrit2003.wikimedia.org with reason: applying gerrit profile
- 20:24 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 20:24 cjming@deploy2002: jdlrobson, cjming: Continuing with sync
- 20:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 20:12 cjming@deploy2002: jdlrobson, cjming: Backport for Expand Vector 2022 roll out and support local variants (T375549) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:11 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host backup1012
- 20:11 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host backup1012
- 20:10 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:10 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt backup1012 - jclark@cumin1002"
- 20:10 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt backup1012 - jclark@cumin1002"
- 20:10 cjming@deploy2002: Started scap sync-world: Backport for Expand Vector 2022 roll out and support local variants (T375549)
- 20:04 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 19:54 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
- 18:59 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:58 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:54 swfrench-wmf: ran authdns-update on dns1004 to pick up mwdebug-next record - T372604
- 18:50 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=mwdebug-next,name=codfw [reason: pooling mwdebug-next in codfw to match mwdebug - T372604]
- 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for pfw1 lo0 - pt1979@cumin2002"
- 18:43 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for pfw1 lo0 - pt1979@cumin2002"
- 18:43 cdanis: 💔cdanis@cumin1002.eqiad.wmnet ~ 🕝☕ sudo cumin -b1 -s120 A:dnsbox 'run-puppet-agent --enable "cdanis rolling out T344171 Ie7d5091bca40"'
- 18:41 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:41 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:40 cdanis: 💙cdanis@cumin1002.eqiad.wmnet ~ 🕝☕ sudo cumin A:dnsbox 'disable-puppet "cdanis rolling out T344171 Ie7d5091bca40"'
- 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 18:39 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:39 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:38 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:34 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 17:45 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-codfw (T372604)
- 17:39 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-codfw (T372604)
- 17:35 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw (T372604)
- 17:35 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw (T372604)
- 17:34 swfrench-wmf: ran and enabled puppet-agent on 'A:lvs and A:codfw' - T372604
- 17:27 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad (T372604)
- 17:21 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad (T372604)
- 17:17 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad (T372604)
- 17:12 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad (T372604)
- 17:09 swfrench-wmf: ran and enabled puppet-agent on 'A:lvs and A:eqiad' - T372604
- 17:04 swfrench-wmf: ran disable-puppet on 'A:lvs and (A:eqiad or A:codfw)' - T372604
- 16:57 moritzm: enable Puppet fleet-wide for puppetmaster1001 hardware maintenance
- 16:49 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Define wgGlobalBlockingEnableAutoblocks as false (T374853), Remove wgGlobalBlockingAllowGlobalAccountBlocks as unused (duration: 06m 50s)
- 16:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2010.codfw.wmnet
- 16:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
- 16:48 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
- 16:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudlb2004-dev.codfw.wmnet
- 16:44 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 16:44 dreamyjazz@deploy2002: dreamyjazz: Backport for Define wgGlobalBlockingEnableAutoblocks as false (T374853), Remove wgGlobalBlockingAllowGlobalAccountBlocks as unused synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 16:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on puppetserver1001.eqiad.wmnet with reason: RAM expansion
- 16:43 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on puppetserver1001.eqiad.wmnet with reason: RAM expansion
- 16:42 dreamyjazz@deploy2002: Started scap sync-world: Backport for Define wgGlobalBlockingEnableAutoblocks as false (T374853), Remove wgGlobalBlockingAllowGlobalAccountBlocks as unused
- 16:40 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudlb2004-dev.codfw.wmnet
- 16:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudlb2004-dev.codfw.wmnet
- 16:37 moritzm: disable Puppet fleet-wide for puppetmaster1001 hardware maintenance
- 16:28 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudlb2004-dev.codfw.wmnet
- 16:26 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad
- 16:25 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad
- 16:24 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad
- 16:23 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad
- 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 16:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudlb2004-dev
- 16:08 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb2004-dev
- 16:08 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 16:08 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 16:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 16:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 16:06 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudlb2004-dev
- 16:06 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb2004-dev
- 16:06 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-misc2001 to codfw - jhancock@cumin2002"
- 16:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-misc2001 to codfw - jhancock@cumin2002"
- 16:02 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 15:41 papaul: mr1-magru end of maintenance
- 15:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f7-eqiad
- 15:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-f7-eqiad
- 15:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e7-eqiad
- 15:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-e7-eqiad
- 15:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e6-eqiad
- 15:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-e6-eqiad
- 15:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f6-eqiad
- 15:33 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-f6-eqiad
- 15:33 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f5-eqiad
- 15:33 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-f5-eqiad
- 15:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
- 15:32 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e5-eqiad
- 15:32 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-e5-eqiad
- 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2004-dev']
- 15:26 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
- 15:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudlb2004-dev']
- 15:19 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
- 15:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudlb2004-dev']
- 15:19 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
- 15:05 brennen@deploy2002: Finished deploy [phabricator/deployment@40a63c9]: deploy phab1004 for T376720 (duration: 01m 07s)
- 15:04 brennen@deploy2002: Started deploy [phabricator/deployment@40a63c9]: deploy phab1004 for T376720
- 15:03 brennen@deploy2002: Finished deploy [phabricator/deployment@40a63c9]: test deploy phab2002 for T376720 (duration: 00m 26s)
- 15:03 brennen@deploy2002: Started deploy [phabricator/deployment@40a63c9]: test deploy phab2002 for T376720
- 15:02 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phab.wmfusercontent.org with reason: version upgrade
- 15:02 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: version upgrade
- 15:02 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phabricator.wikimedia.org with reason: version upgrade
- 15:02 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on phabricator.wikimedia.org with reason: version upgrade
- 15:02 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: version upgrade
- 15:02 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: version upgrade
- 15:01 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: version upgrade
- 15:01 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: version upgrade
- 14:58 papaul: mr1-magru ongoing maintenance
- 14:56 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
- 14:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
- 14:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 14:47 sergi0: deployment-prep: `sgimeno@deployment-mwmaint03:~$ foreachwiki userOptions.php --delete --old=1 growthexperiments-tour-newimpact-discovery` (T376461)
- 14:41 moritzm: installing python-aiosmtpd security updates
- 14:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2010.codfw.wmnet
- 14:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2010.codfw.wmnet
- 14:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1010.eqiad.wmnet
- 14:30 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2010.codfw.wmnet
- 14:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
- 14:23 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1010.eqiad.wmnet
- 14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2004-dev']
- 14:22 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1009.eqiad.wmnet
- 14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-misc2001
- 14:22 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-misc2001
- 14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 14:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
- 14:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudlb2004-dev']
- 14:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
- 14:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2004-dev']
- 14:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
- 14:15 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudlb2004-dev
- 14:15 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb2004-dev
- 14:13 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-misc2001
- 14:13 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-misc2001
- 14:10 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1009.eqiad.wmnet
- 14:08 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:08 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-misc2001 to codfw - jhancock@cumin2002"
- 14:08 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-misc2001 to codfw - jhancock@cumin2002"
- 14:05 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 14:03 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 13:59 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 13:53 zabe@deploy2002: Finished scap sync-world: Backport for Stop setting wgAbuseFilterActorTableSchemaMigrationStage (T188180) (duration: 07m 03s)
- 13:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-staging2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 13:49 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-staging2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 13:49 zabe@deploy2002: zabe: Continuing with sync
- 13:48 zabe@deploy2002: zabe: Backport for Stop setting wgAbuseFilterActorTableSchemaMigrationStage (T188180) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:46 zabe@deploy2002: Started scap sync-world: Backport for Stop setting wgAbuseFilterActorTableSchemaMigrationStage (T188180)
- 13:46 zabe@deploy2002: Finished scap sync-world: Backport for s5: Reduce revision-slots cache expiry to 60 seconds (T183490) (duration: 07m 10s)
- 13:41 zabe@deploy2002: zabe: Continuing with sync
- 13:41 zabe@deploy2002: zabe: Backport for s5: Reduce revision-slots cache expiry to 60 seconds (T183490) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:39 zabe@deploy2002: Started scap sync-world: Backport for s5: Reduce revision-slots cache expiry to 60 seconds (T183490)
- 13:33 Lucas_WMDE: UTC afternoon backport+config window done
- 13:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Remove $wgCodeMirrorRTL temporary feature flag (T170001 T357795) (duration: 06m 56s)
- 13:27 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, musikanimal: Continuing with sync
- 13:27 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, musikanimal: Backport for Remove $wgCodeMirrorRTL temporary feature flag (T170001 T357795) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:24 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Remove $wgCodeMirrorRTL temporary feature flag (T170001 T357795)
- 13:24 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2001.codfw.wmnet
- 13:24 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2001.codfw.wmnet
- 13:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-staging2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 13:15 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-staging2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 13:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host deploy1003.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 13:11 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for hawiki: Add temporary tagline for Vector-2022 (T376049) (duration: 08m 17s)
- 13:11 elukey@cumin1002: START - Cookbook sre.hosts.provision for host deploy1003.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 13:09 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host parsoidtest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 13:07 lucaswerkmeister-wmde@deploy2002: ammarpad, lucaswerkmeister-wmde: Continuing with sync
- 13:06 lucaswerkmeister-wmde@deploy2002: ammarpad, lucaswerkmeister-wmde: Backport for hawiki: Add temporary tagline for Vector-2022 (T376049) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host parsoidtest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 13:03 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for hawiki: Add temporary tagline for Vector-2022 (T376049)
- 12:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host krb1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 12:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host krb1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 12:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2008.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 12:57 Amir1: dropping povwatch_log on all.dblist (T54924 and T376627)
- 12:55 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ganeti2036.codfw.wmnet
- 12:53 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy2008.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 12:53 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2007.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 12:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy2007.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 12:49 ladsgroup@deploy2002: Finished scap sync-world: Backport for Remove flow from techconductwiki (T332022) (duration: 09m 27s)
- 12:47 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 12:45 moritzm: installing lua5.4 bugfix updates
- 12:44 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 12:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 12:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 12:42 ladsgroup@deploy2002: ladsgroup: Backport for Remove flow from techconductwiki (T332022) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 12:39 ladsgroup@deploy2002: Started scap sync-world: Backport for Remove flow from techconductwiki (T332022)
- 12:39 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 12:36 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 12:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 12:32 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1005.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 12:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2036.codfw.wmnet
- 12:29 elukey@cumin1002: START - Cookbook sre.hosts.provision for host an-conf1005.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 12:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2036.codfw.wmnet
- 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2036.codfw.wmnet
- 12:26 moritzm: remove ganeti2009 from active nodes T376594
- 12:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1008.eqiad.wmnet
- 12:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2009.codfw.wmnet
- 12:19 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2001.codfw.wmnet with OS bookworm
- 12:15 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1008.eqiad.wmnet
- 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1007.eqiad.wmnet
- 12:01 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
- 11:56 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
- 11:52 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1007.eqiad.wmnet
- 11:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1006.eqiad.wmnet
- 11:35 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage2001.codfw.wmnet with OS bookworm
- 11:33 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2001.codfw.wmnet
- 11:30 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2001.codfw.wmnet
- 11:30 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2002.codfw.wmnet
- 11:30 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2002.codfw.wmnet
- 11:29 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1006.eqiad.wmnet
- 11:28 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2002.codfw.wmnet with OS bookworm
- 11:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:13 elukey@cumin1002: START - Cookbook sre.hosts.provision for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:09 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
- 11:06 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
- 10:55 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2009.codfw.wmnet
- 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2009.codfw.wmnet
- 10:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2009.codfw.wmnet
- 10:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2009.codfw.wmnet
- 10:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2009.codfw.wmnet
- 10:49 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm
- 10:49 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002"
- 10:45 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage2002.codfw.wmnet with OS bookworm
- 10:36 jayme: updated kubernetes 1.23.14-3 -> 1.23.14-4 on P:kubernetes::node - T362408
- 10:27 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 10:26 jayme: re-enable puppet on all P:kubernetes::node
- 10:26 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002"
- 10:09 jayme: disabled puppet on all P:kubernetes::node
- 10:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
- 10:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
- 09:52 moritzm: installing freetype bugfix updates from Bookworm point update
- 09:48 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
- 09:48 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:47 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:36 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:33 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1005.eqiad.wmnet
- 09:29 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:26 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:25 jayme: imported kubernetes 1.23.14-4 to component/kubernetes123 (buster, bullseye, bookworm) - T362408
- 09:23 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:20 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:17 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1005.eqiad.wmnet
- 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2036.codfw.wmnet to cluster codfw and group C
- 09:12 Dreamy_Jazz: Maintenance script for T376340 finished
- 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2036.codfw.wmnet to cluster codfw and group C
- 09:11 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:10 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:06 Dreamy_Jazz: Ran `mwscript-k8s --comment="T376340" -- extensions/GlobalBlocking/maintenance/UpdateAutoBlockParentIdColumn.php --wiki=aawikibooks`
- 09:01 stran@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
- 08:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2036.codfw.wmnet
- 08:55 stran@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply
- 08:55 stran@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
- 08:54 stran@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
- 08:53 stran@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
- 08:53 stran@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
- 08:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2036.codfw.wmnet
- 08:20 dcausse: repooling wdqs1013
- 08:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance
- 08:20 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance
- 08:19 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.26 refs T375657
- 08:16 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: T374215', diff saved to https://phabricator.wikimedia.org/P69498 and previous config saved to /var/cache/conftool/dbconfig/20241008-081620-arnaudb.json
- 08:01 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: T374215', diff saved to https://phabricator.wikimedia.org/P69497 and previous config saved to /var/cache/conftool/dbconfig/20241008-080115-arnaudb.json
- 07:46 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 50%: T374215', diff saved to https://phabricator.wikimedia.org/P69496 and previous config saved to /var/cache/conftool/dbconfig/20241008-074609-arnaudb.json
- 07:44 vgutierrez: uploaded golang-github-jvgutierrez-go-etcd-harness 1.0.0 to apt.wm.o (bookworm-wikimedia) - T376600
- 07:31 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 25%: T374215', diff saved to https://phabricator.wikimedia.org/P69495 and previous config saved to /var/cache/conftool/dbconfig/20241008-073104-arnaudb.json
- 07:16 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 15%: T374215', diff saved to https://phabricator.wikimedia.org/P69494 and previous config saved to /var/cache/conftool/dbconfig/20241008-071559-arnaudb.json
- 07:10 dcausse: depooling wdqs1013 (lag)
- 07:00 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 10%: T374215', diff saved to https://phabricator.wikimedia.org/P69493 and previous config saved to /var/cache/conftool/dbconfig/20241008-070053-arnaudb.json
- 06:45 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 5%: T374215', diff saved to https://phabricator.wikimedia.org/P69492 and previous config saved to /var/cache/conftool/dbconfig/20241008-064548-arnaudb.json
- 04:01 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.23 (duration: 00m 58s)
- 03:50 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.43.0-wmf.26 refs T375657 (duration: 47m 44s)
- 03:16 eileen: civicrm upgraded from 8b13ef22 to 61718eae
- 03:15 eileen: config revision changed from 6e649356 to 9ba217d2
- 03:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.43.0-wmf.26 refs T375657
- 00:55 eileen: config revision changed from 856e4d99 to 6e649356
- 00:30 eileen: config revision changed from 856e4d99 to 4ab498d2 - disable process control to load triggers
2024-10-07
- 22:33 eileen: civicrm upgraded from f2095695 to 8b13ef22
- 22:09 eileen: config revision changed from a2ba4a8d to 856e4d99
- 21:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
- 20:20 urbanecm@deploy2002: Finished scap sync-world: Backport for disable the Add A Fact QuickSurvey on enwiki, Enable EditCheck on ru.wiki (T373022) (duration: 07m 41s)
- 20:16 urbanecm@deploy2002: esanders, derenrich, urbanecm: Continuing with sync
- 20:14 urbanecm@deploy2002: esanders, derenrich, urbanecm: Backport for disable the Add A Fact QuickSurvey on enwiki, Enable EditCheck on ru.wiki (T373022) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:12 urbanecm@deploy2002: Started scap sync-world: Backport for disable the Add A Fact QuickSurvey on enwiki, Enable EditCheck on ru.wiki (T373022)
- 20:12 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
- 19:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2004-dev']
- 19:56 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
- 19:56 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
- 18:22 swfrench-wmf: running `git restore helmfile.d/services/thumbor/values.yaml` on deploy1003 to unblock git-pull timer
- 18:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:14 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-misc2001 to codfw - jhancock@cumin2002"
- 18:14 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-misc2001 to codfw - jhancock@cumin2002"
- 18:10 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 17:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 17:29 swfrench@deploy2002: Finished scap sync-world: Testing scap after mw-debug next bring-up - T372604 (duration: 02m 45s)
- 17:26 swfrench@deploy2002: Started scap sync-world: Testing scap after mw-debug next bring-up - T372604
- 17:12 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 17:12 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 17:06 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 17:06 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 16:26 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2001.codfw.wmnet with OS bookworm
- 16:24 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
- 16:16 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2002.codfw.wmnet with OS bookworm
- 16:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
- 16:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
- 15:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
- 15:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
- 15:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
- 15:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
- 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on puppetserver1003.eqiad.wmnet with reason: RAM expansion
- 15:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on puppetserver1003.eqiad.wmnet with reason: RAM expansion
- 15:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on puppetserver1002.eqiad.wmnet with reason: RAM expansion
- 15:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on puppetserver1002.eqiad.wmnet with reason: RAM expansion
- 15:13 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts puppetmaster1001.eqiad.wmnet
- 15:13 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts puppetmaster1001.eqiad.wmnet
- 15:00 papaul: ongoing maintenance on mr1-esams
- 14:43 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
- 14:40 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
- 14:18 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage2002.codfw.wmnet with OS bookworm
- 14:16 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wikikube-worker2092.codfw.wmnet with reason: Degraded RAID
- 14:16 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wikikube-worker2092.codfw.wmnet with reason: Degraded RAID
- 13:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T367856)', diff saved to https://phabricator.wikimedia.org/P69489 and previous config saved to /var/cache/conftool/dbconfig/20241007-134950-ladsgroup.json
- 13:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1203.eqiad.wmnet with reason: Maintenance
- 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2036.codfw.wmnet
- 13:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1203.eqiad.wmnet with reason: Maintenance
- 13:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T367856)', diff saved to https://phabricator.wikimedia.org/P69488 and previous config saved to /var/cache/conftool/dbconfig/20241007-134929-ladsgroup.json
- 13:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2036.codfw.wmnet
- 13:37 vgutierrez: switching to digicert-2024 certificates on esams, eqsin, drmrs and magru
- 13:36 Lucas_WMDE: UTC afternoon backport+config window done
- 13:35 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Update globalblocks 'gb_address' index to allow autoblocks (T376052) (duration: 06m 49s)
- 13:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P69487 and previous config saved to /var/cache/conftool/dbconfig/20241007-133422-ladsgroup.json
- 13:31 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 13:30 dreamyjazz@deploy2002: dreamyjazz: Backport for Update globalblocks 'gb_address' index to allow autoblocks (T376052) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:28 dreamyjazz@deploy2002: Started scap sync-world: Backport for Update globalblocks 'gb_address' index to allow autoblocks (T376052)
- 13:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P69486 and previous config saved to /var/cache/conftool/dbconfig/20241007-131915-ladsgroup.json
- 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2035.codfw.wmnet to cluster codfw and group C
- 13:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2035.codfw.wmnet to cluster codfw and group C
- 13:10 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for scandium is being replaced by parsoidtest1001 (T363402) (duration: 07m 14s)
- 13:05 lucaswerkmeister-wmde@deploy2002: arlolra, lucaswerkmeister-wmde: Continuing with sync
- 13:05 lucaswerkmeister-wmde@deploy2002: arlolra, lucaswerkmeister-wmde: Backport for scandium is being replaced by parsoidtest1001 (T363402) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T367856)', diff saved to https://phabricator.wikimedia.org/P69485 and previous config saved to /var/cache/conftool/dbconfig/20241007-130409-ladsgroup.json
- 13:03 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for scandium is being replaced by parsoidtest1001 (T363402)
- 13:02 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2035.codfw.wmnet to cluster codfw and group C
- 13:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2035.codfw.wmnet to cluster codfw and group C
- 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2035.codfw.wmnet
- 12:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2035.codfw.wmnet
- 12:53 Lucas_WMDE: printf 'https://en.wikipedia.org/static/images/%s\n' 'mobile/copyright/wikimaniawiki-wordmark.svg' 'project-logos/wikimaniawiki-1.5x.png' 'project-logos/wikimaniawiki-2x.png' 'project-logos/wikimaniawiki.png' 'icons/wikimaniawiki.svg' | mwscript-k8s --attach -- purgeList enwiki # T376292
- 12:03 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
- 12:02 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
- 11:29 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 11:29 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 11:25 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 11:25 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 11:16 vgutierrez: uploaded golang-github-mtchavez-jenkins 1.0.0 to apt.wm.o (bookworm-wikimedia) - T376600
- 11:04 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 100%: T374215', diff saved to https://phabricator.wikimedia.org/P69484 and previous config saved to /var/cache/conftool/dbconfig/20241007-110430-arnaudb.json
- 10:52 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2002.codfw.wmnet
- 10:50 Dreamy_Jazz: Started 2 day scan on enwiki for MediaModeration to catchup with monthly request limit - https://wikitech.wikimedia.org/wiki/MediaModeration
- 10:49 Dreamy_Jazz: Started MediaModeration scanning script after it crashed for commonswiki - https://wikitech.wikimedia.org/wiki/MediaModeration
- 10:49 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2002.codfw.wmnet
- 10:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 75%: T374215', diff saved to https://phabricator.wikimedia.org/P69483 and previous config saved to /var/cache/conftool/dbconfig/20241007-104925-arnaudb.json
- 10:47 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2001.codfw.wmnet
- 10:47 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2001.codfw.wmnet
- 10:34 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 50%: T374215', diff saved to https://phabricator.wikimedia.org/P69482 and previous config saved to /var/cache/conftool/dbconfig/20241007-103420-arnaudb.json
- 10:19 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 25%: T374215', diff saved to https://phabricator.wikimedia.org/P69481 and previous config saved to /var/cache/conftool/dbconfig/20241007-101914-arnaudb.json
- 10:17 vgutierrez: uploaded golang-github-cloudflare-ipvs 0.10.2 to apt.wm.o (bookworm-wikimedia) - T376600
- 10:13 moritzm: installing Linux 6.1.112 on Bookworm systems
- 10:11 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
- 10:10 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
- 10:04 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 10%: T374215', diff saved to https://phabricator.wikimedia.org/P69480 and previous config saved to /var/cache/conftool/dbconfig/20241007-100410-arnaudb.json
- 10:00 vgutierrez: uploaded golang-github-flyingmutant-rapid 1.1.0 to apt.wm.o (bookworm-wikimedia) - T376600
- 09:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 5%: T374215', diff saved to https://phabricator.wikimedia.org/P69478 and previous config saved to /var/cache/conftool/dbconfig/20241007-094904-arnaudb.json
- 09:34 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 2%: T374215', diff saved to https://phabricator.wikimedia.org/P69477 and previous config saved to /var/cache/conftool/dbconfig/20241007-093359-arnaudb.json
- 09:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: maintenance
- 09:30 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: maintenance
- 09:27 arnaudb@cumin1002: dbctl commit (dc=all): 'missing commit', diff saved to https://phabricator.wikimedia.org/P69476 and previous config saved to /var/cache/conftool/dbconfig/20241007-092714-arnaudb.json
- 09:19 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 1%: T374215', diff saved to https://phabricator.wikimedia.org/P69474 and previous config saved to /var/cache/conftool/dbconfig/20241007-091953-arnaudb.json
- 09:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 1%: T374215', diff saved to https://phabricator.wikimedia.org/P69473 and previous config saved to /var/cache/conftool/dbconfig/20241007-091854-arnaudb.json
- 09:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1233.eqiad.wmnet onto db1246.eqiad.wmnet
- 08:37 aqu@deploy2002: Finished deploy [airflow-dags/analytics@1699d34]: Refine staging fixes [airflow-dags@1699d34f] (duration: 04m 43s)
- 08:32 aqu@deploy2002: Started deploy [airflow-dags/analytics@1699d34]: Refine staging fixes [airflow-dags@1699d34f]
- 08:24 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503] (duration: 00m 13s)
- 08:24 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503]
- 08:02 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503] (duration: 00m 18s)
- 08:02 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
- 08:02 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503]
- 08:02 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
- 08:01 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
- 08:01 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
- 08:00 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1002.eqiad.wmnet
- 07:57 aborrero@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
- 07:56 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db1233.eqiad.wmnet onto db1246.eqiad.wmnet
- 07:56 arnaudb@cumin1002: dbctl commit (dc=all): 'T374215 db1233 depool as clone source for db1246', diff saved to https://phabricator.wikimedia.org/P69471 and previous config saved to /var/cache/conftool/dbconfig/20241007-075611-arnaudb.json
- 07:56 hashar: UTC morning backport window completed
- 07:54 hashar@deploy2002: Finished scap sync-world: Backport for logos: Sync config.yaml and logos.php (T374430), hawiki: Add temporary logo (T376049) (duration: 11m 19s)
- 07:49 hashar@deploy2002: ammarpad, hashar: Continuing with sync
- 07:45 hashar@deploy2002: ammarpad, hashar: Backport for logos: Sync config.yaml and logos.php (T374430), hawiki: Add temporary logo (T376049) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:43 hashar@deploy2002: Started scap sync-world: Backport for logos: Sync config.yaml and logos.php (T374430), hawiki: Add temporary logo (T376049)
- 07:42 hashar@deploy2002: Finished scap sync-world: Backport for Revert "wikimaniawiki: Update logos to 2024" (duration: 21m 40s)
- 07:04 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 07:04 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64315
- 07:04 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 64315
- 07:04 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
2024-10-06
2024-10-05
- 19:43 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 16:45 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
- 16:41 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
- 16:40 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
- 16:37 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
- 16:36 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
- 16:36 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
- 13:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T367856)', diff saved to https://phabricator.wikimedia.org/P69470 and previous config saved to /var/cache/conftool/dbconfig/20241005-133058-ladsgroup.json
- 13:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
- 13:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
- 13:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T367856)', diff saved to https://phabricator.wikimedia.org/P69469 and previous config saved to /var/cache/conftool/dbconfig/20241005-133036-ladsgroup.json
- 13:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P69468 and previous config saved to /var/cache/conftool/dbconfig/20241005-131529-ladsgroup.json
- 13:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P69467 and previous config saved to /var/cache/conftool/dbconfig/20241005-130022-ladsgroup.json
- 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T367856)', diff saved to https://phabricator.wikimedia.org/P69466 and previous config saved to /var/cache/conftool/dbconfig/20241005-124515-ladsgroup.json
2024-10-04
- 17:48 ejegg: fundraising civicrm upgraded from 90199f62 to 45855ff4
- 16:21 ayounsi@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host sretest2001.codfw.wmnet
- 16:00 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
- 14:29 mforns@deploy2002: Finished deploy [airflow-dags/analytics@4b69f50]: add category to commons impact metrics allowlist (duration: 01m 48s)
- 14:28 mforns@deploy2002: Started deploy [airflow-dags/analytics@4b69f50]: add category to commons impact metrics allowlist
- 13:54 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
- 13:33 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.categories-reload (exit_code=97) reloading categories to wdqs-categories1001.eqiad.wmnet
- 13:32 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
- 13:19 ayounsi@cumin1002: START - Cookbook sre.hosts.dhcp for host sretest2001.codfw.wmnet
- 12:00 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@9096f1b] (releasing): (no justification provided) (duration: 01m 13s)
- 11:59 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@9096f1b] (releasing): (no justification provided)
- 11:47 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@9096f1b] (releasing): (no justification provided) (duration: 00m 47s)
- 11:46 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@9096f1b] (releasing): (no justification provided)
- 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2004.wikimedia.org
- 10:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2004.wikimedia.org
- 10:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1004.wikimedia.org
- 10:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1004.wikimedia.org
- 10:07 moritzm: upload ircstream 0.13.0+sse12u1 to apt.wikimedia.org bookworm/ircstream-sse component (seperate build using the experimental eventstream feature branch of ircstream) T376014
- 09:43 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database shnwikinews (T375432)
- 09:35 moritzm: upload ircstream 0.13.0+wmf12u1 to apt.wikimedia.org T376014
- 09:18 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database shnwikinews (T375432)
- 09:17 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database kgewiki (T374814)
- 09:17 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database kgewiki (T374814)
- 09:17 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database gorwikiquote (T375094)
- 09:16 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database gorwikiquote (T375094)
- 09:16 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database madwiktionary (T375023)
- 09:16 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database madwiktionary (T375023)
- 09:15 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database moswiki (T375568)
- 09:15 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database moswiki (T375568)
- 09:09 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 08:58 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 07:51 oblivian@puppetserver1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=kubernetes,name=mw1439.eqiad.wmnet
- 07:51 oblivian@puppetserver1001: conftool action : set/weight=1; selector: dc=eqiad,cluster=kubernetes,name=mw1439.eqiad.wmnet
- 07:30 hashar: upgrading Jenkins on CI Jenkins
- 07:04 moritzm: import jenkins 2.462.3 to thirdparty/ci T376449
- 01:45 ejegg: payments-wiki upgraded from e88750e6 to ed2d78b3
2024-10-03
- 22:37 brennen@deploy2002: Finished scap sync-world: Backport for Revert "Turn on Parsoid Selective Update metrics" (T376433) (duration: 07m 04s)
- 22:33 brennen@deploy2002: brennen: Continuing with sync
- 22:32 brennen@deploy2002: brennen: Backport for Revert "Turn on Parsoid Selective Update metrics" (T376433) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:30 brennen@deploy2002: Started scap sync-world: Backport for Revert "Turn on Parsoid Selective Update metrics" (T376433)
- 22:18 brennen@deploy2002: scap failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.43.0-wmf.25 --multiversion-image-name docker-registry.discovery.wmnet/restricted/mediawiki-multiversion --multiversion-debug-image-name docker-registry.discovery.wmnet/restricted/m
- 22:18 brennen@deploy2002: Started scap sync-world: Backport for Revert "Turn on Parsoid Selective Update metrics" (T376433)
- 22:15 brennen@deploy2002: scap failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.43.0-wmf.25 --multiversion-image-name docker-registry.discovery.wmnet/restricted/mediawiki-multiversion --multiversion-debug-image-name docker-registry.discovery.wmnet/restricted/m
- 22:15 brennen@deploy2002: Started scap sync-world: Backport for Revert "Turn on Parsoid Selective Update metrics" (T376433)
- 21:39 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
- 21:39 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
- 21:28 brennen: end of UTC late backport & config window
- 21:28 brennen@deploy2002: Finished scap sync-world: Backport for Turn on Parsoid Selective Update metrics (T371713) (duration: 15m 30s)
- 21:23 brennen@deploy2002: cscott, brennen: Continuing with sync
- 21:15 brennen@deploy2002: cscott, brennen: Backport for Turn on Parsoid Selective Update metrics (T371713) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:13 brennen@deploy2002: Started scap sync-world: Backport for Turn on Parsoid Selective Update metrics (T371713)
- 21:11 brennen@deploy2002: Finished scap sync-world: Backport for RefreshLinksJob: Fix exception due to null/false confusion (take 2) (duration: 10m 09s)
- 21:06 brennen@deploy2002: cscott, brennen: Continuing with sync
- 21:02 brennen@deploy2002: cscott, brennen: Backport for RefreshLinksJob: Fix exception due to null/false confusion (take 2) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:00 brennen@deploy2002: Started scap sync-world: Backport for RefreshLinksJob: Fix exception due to null/false confusion (take 2)
- 20:56 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1022.eqiad.wmnet with OS bullseye
- 20:44 brennen@deploy2002: Finished scap sync-world: Backport for Update jquery.ime from upstream (duration: 09m 25s)
- 20:39 brennen@deploy2002: brennen, amire80: Continuing with sync
- 20:37 brennen@deploy2002: brennen, amire80: Backport for Update jquery.ime from upstream synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:34 brennen@deploy2002: Started scap sync-world: Backport for Update jquery.ime from upstream
- 20:02 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
- 20:02 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
- 19:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 19:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 19:51 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
- 19:50 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
- 19:49 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
- 19:48 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
- 19:42 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host aqs1022.eqiad.wmnet with OS bullseye
- 19:36 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
- 19:35 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
- 19:28 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@a3efe93] (wcqs): Deploy 0.3.148 to WCQS (duration: 03m 02s)
- 19:25 ryankemper@deploy2002: Started deploy [wdqs/wdqs@a3efe93] (wcqs): Deploy 0.3.148 to WCQS
- 19:25 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
- 19:25 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
- 19:22 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@a3efe93]: 0.3.148 (duration: 08m 42s)
- 19:18 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 19:18 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 19:16 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 19:14 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.148` on canary `wdqs1016`; proceeding to rest of fleet
- 19:14 ryankemper@deploy2002: Started deploy [wdqs/wdqs@a3efe93]: 0.3.148
- 19:13 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.148`. Pre-deploy tests passing on canary `wdqs1016`
- 19:09 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 19:09 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 19:05 dduvall@deploy2002: Installing scap version "4.109.0" for 210 hosts
- 18:51 cmooney@cumin1002: conftool action : set/pooled=yes; selector: name=dns1005.wikimedia.org [reason: testing T344171]
- 18:43 xcollazo@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 18:43 xcollazo@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 18:31 cstone: SmashPig upgraded from df2a9c42 to eaa176f7
- 18:28 sukhe: depool dns1005 for all services for testing T344171
- 18:00 mutante: codesearch - ran out of disk due to 11G /var/log/account/pacct file - manually ran /etc/cron.daily/acct to rotate it, then deleted old file, back to 39% disk usage
- 17:41 mutante: codesearch was broken - VM was down - rebooted - restarting all the indices is a bit slow but mostly back up now
- 17:13 swfrench@deploy2002: Finished scap sync-world: Testing after mediawiki-deployments.yaml format change - T370934 (duration: 02m 50s)
- 17:11 swfrench@deploy2002: Started scap sync-world: Testing after mediawiki-deployments.yaml format change - T370934
- 15:58 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T364077, testing new flag; this should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
- 15:53 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 59.75.192.10.in-addr.arpa on all recursors
- 15:53 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache 59.75.192.10.in-addr.arpa on all recursors
- 15:53 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new flag; this should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
- 15:52 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new flag; this should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
- 15:52 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new flag; this should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
- 15:51 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
- 15:51 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
- 15:50 topranks: merging patch to add k8s pod IP range reverse delegations to dns T376291
- 15:47 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
- 15:47 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
- 15:46 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
- 15:46 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
- 15:46 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
- 15:45 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
- 15:45 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
- 15:45 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
- 15:36 papaul: Junos upgrade on mr1-codfw complete
- 15:00 papaul: ongoing Junos upgrade on mr1-codfw
- 14:56 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@b715af7]: Deploy latest DAGs to the analytics Airflow instance. T373694. T375402 (duration: 03m 33s)
- 14:52 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@b715af7]: Deploy latest DAGs to the analytics Airflow instance. T373694. T375402
- 14:31 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aqs1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:31 jclark@cumin1002: START - Cookbook sre.hosts.provision for host aqs1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:30 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host aqs1022
- 14:29 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host aqs1022
- 14:29 jclark@cumin1002: END (ERROR) - Cookbook sre.network.configure-switch-interfaces (exit_code=97) for host aqs1022
- 14:28 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host aqs1022
- 14:28 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:28 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt aqs1022 - jclark@cumin1002"
- 14:26 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt aqs1022 - jclark@cumin1002"
- 14:23 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 13:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 13:54 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 13:46 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2004.wikimedia.org
- 13:42 elukey@cumin1002: START - Cookbook sre.hosts.reboot-single for host irc2004.wikimedia.org
- 13:40 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc2004.wikimedia.org
- 13:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host irc2004.wikimedia.org with OS bookworm
- 13:32 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
- 13:31 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye
- 13:30 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
- 13:26 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc2004.wikimedia.org with reason: host reimage
- 13:23 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc2004.wikimedia.org with reason: host reimage
- 13:10 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host irc2004.wikimedia.org with OS bookworm
- 13:09 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc2004.wikimedia.org - elukey@cumin1002"
- 13:09 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc2004.wikimedia.org - elukey@cumin1002"
- 13:09 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc2004.wikimedia.org on all recursors
- 13:09 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache irc2004.wikimedia.org on all recursors
- 13:09 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:09 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2004.wikimedia.org - elukey@cumin1002"
- 13:08 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2004.wikimedia.org - elukey@cumin1002"
- 13:00 elukey@cumin1002: START - Cookbook sre.dns.netbox
- 13:00 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host irc2004.wikimedia.org
- 12:20 urbanecm@deploy2002: Finished scap sync-world: Backport for ReassignMenteesJob: Do not schedule follow-up jobs when first job fails (T376124) (duration: 06m 47s)
- 12:14 urbanecm@deploy2002: Started scap sync-world: Backport for ReassignMenteesJob: Do not schedule follow-up jobs when first job fails (T376124)
- 12:13 urbanecm@deploy2002: scap failed: <UnboundLocalError> local variable 'e' referenced before assignment (scap version: 4.108.0-1) (duration: 08m 02s)
- 12:13 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 12:09 elukey@cumin1002: START - Cookbook sre.hosts.provision for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 12:05 urbanecm@deploy2002: Started scap sync-world: Backport for ReassignMenteesJob: Do not schedule follow-up jobs when first job fails (T376124)
- 12:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 12:02 elukey@cumin1002: START - Cookbook sre.hosts.provision for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T367856)', diff saved to https://phabricator.wikimedia.org/P69458 and previous config saved to /var/cache/conftool/dbconfig/20241003-111544-ladsgroup.json
- 11:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
- 11:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
- 11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T367856)', diff saved to https://phabricator.wikimedia.org/P69457 and previous config saved to /var/cache/conftool/dbconfig/20241003-111522-ladsgroup.json
- 11:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P69456 and previous config saved to /var/cache/conftool/dbconfig/20241003-110015-ladsgroup.json
- 10:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P69454 and previous config saved to /var/cache/conftool/dbconfig/20241003-104508-ladsgroup.json
- 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T367856)', diff saved to https://phabricator.wikimedia.org/P69453 and previous config saved to /var/cache/conftool/dbconfig/20241003-103001-ladsgroup.json
- 10:29 urbanecm@deploy2002: Finished scap sync-world: Backport for Backport ReassignMenteesJob-related changes (T376124) (duration: 06m 54s)
- 10:29 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:25 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:22 urbanecm@deploy2002: Started scap sync-world: Backport for Backport ReassignMenteesJob-related changes (T376124)
- 10:11 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:08 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:06 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:06 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:04 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM irc1004.wikimedia.org
- 10:00 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@b715af7]: T375153 (duration: 02m 44s)
- 10:00 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM irc1004.wikimedia.org
- 09:58 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@b715af7]: T375153
- 09:42 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
- 09:41 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
- 09:38 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
- 09:38 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
- 09:35 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 09:35 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 08:36 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.25 refs T375656
- 08:25 hashar@deploy2002: Finished scap sync-world: Backport for Deprecate ParserOutput::setLanguageLinks(null) (T376323) (duration: 07m 07s)
- 08:20 hashar@deploy2002: hashar, cscott: Continuing with sync
- 08:20 hashar@deploy2002: hashar, cscott: Backport for Deprecate ParserOutput::setLanguageLinks(null) (T376323) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:18 hashar@deploy2002: Started scap sync-world: Backport for Deprecate ParserOutput::setLanguageLinks(null) (T376323)
- 08:14 hashar@deploy2002: Finished scap sync-world: Backport for bjnwiki: Update logo (T375055), bjnwiktionary: Add logo (T374898) (duration: 08m 37s)
- 08:09 hashar@deploy2002: hashar, hamishz: Continuing with sync
- 08:07 hashar@deploy2002: hashar, hamishz: Backport for bjnwiki: Update logo (T375055), bjnwiktionary: Add logo (T374898) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:05 hashar@deploy2002: Started scap sync-world: Backport for bjnwiki: Update logo (T375055), bjnwiktionary: Add logo (T374898)
- 08:03 hashar: Ran `mwscript resetAuthenticationThrottle.php --signup --ip 14.139.82.6` for `metawiki`, `mediawikiwiki` and `wikidatawiki` # T375794
- 07:59 hashar@deploy2002: Finished scap sync-world: Backport for throttle.php: Remove expired throttle, IP limit exemption for WTS 2024 (T375794) (duration: 08m 41s)
- 07:54 hashar@deploy2002: anzx, hamishz, hashar: Continuing with sync
- 07:53 hashar@deploy2002: anzx, hamishz, hashar: Backport for throttle.php: Remove expired throttle, IP limit exemption for WTS 2024 (T375794) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:50 hashar@deploy2002: Started scap sync-world: Backport for throttle.php: Remove expired throttle, IP limit exemption for WTS 2024 (T375794)
- 07:17 kartik@deploy2002: Finished scap sync-world: Backport for Section Translation: Add mos, kde and rsk Wikipedias (T375017 T374815 T374644) (duration: 10m 39s)
- 07:12 kartik@deploy2002: kartik: Continuing with sync
- 07:08 kartik@deploy2002: kartik: Backport for Section Translation: Add mos, kde and rsk Wikipedias (T375017 T374815 T374644) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:06 kartik@deploy2002: Started scap sync-world: Backport for Section Translation: Add mos, kde and rsk Wikipedias (T375017 T374815 T374644)
- 06:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 06:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
2024-10-02
- 23:47 urbanecm@deploy2002: Finished scap sync-world: Backport for Revert "logging: Enable logging for debug GrowthExperiments events" (T376124) (duration: 07m 07s)
- 23:39 urbanecm@deploy2002: Started scap sync-world: Backport for Revert "logging: Enable logging for debug GrowthExperiments events" (T376124)
- 22:35 urbanecm@deploy2002: Finished scap sync-world: Backport for logging: Enable logging for debug GrowthExperiments events (T376124) (duration: 06m 52s)
- 22:28 urbanecm@deploy2002: Started scap sync-world: Backport for logging: Enable logging for debug GrowthExperiments events (T376124)
- 21:55 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs-categories1001.eqiad.wmnet with reason: T375687
- 21:54 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs-categories1001.eqiad.wmnet with reason: T375687
- 21:24 mutante: phab1004 - link=$(/usr/bin/readlink -f /srv/phab) ; /usr/bin/git config -f /etc/gitconfig.d/10-phab-deploy-safedir.gitconfig --add safe.directory $link ; /bin/cat /etc/gitconfig.d/*.gitconfig > /etc/gitconfig - T360756
- 20:57 eileen: civicrm upgraded from 28fd5e3b to 90199f62
- 20:01 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-misc1001.eqiad.wmnet with OS bookworm
- 20:01 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 20:00 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 19:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-misc1002.eqiad.wmnet with OS bookworm
- 19:58 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 19:57 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 19:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-misc1001.eqiad.wmnet with reason: host reimage
- 19:42 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-misc1002.eqiad.wmnet with reason: host reimage
- 19:38 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-misc1001.eqiad.wmnet with reason: host reimage
- 19:38 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-misc1002.eqiad.wmnet with reason: host reimage
- 19:27 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-misc1002.eqiad.wmnet with OS bookworm
- 19:26 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-misc1001.eqiad.wmnet with OS bookworm
- 19:23 cstone: SmashPig upgraded from 715e91fa to df2a9c42
- 19:21 brett: cumin -b11 "A:cp" "run-puppet-agent --enable 'rolling out 1038884'"
- 19:16 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
- 19:15 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
- 19:13 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp4041.ulsfo.wmnet
- 19:06 brett@cumin2002: conftool action : set/pooled=no; selector: name=cp4041.ulsfo.wmnet
- 18:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2004-dev']
- 18:23 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
- 18:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 18:21 denisse@deploy2002: Finished deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 24.9.1 - T376256 (duration: 00m 12s)
- 18:21 denisse@deploy2002: Started deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 24.9.1 - T376256
- 18:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 18:10 hashar@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.25 refs T375656
- 18:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 17:22 aokoth@cumin1002: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet
- 17:20 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet
- 17:02 aokoth@cumin1002: END (FAIL) - Cookbook sre.vrts.upgrade (exit_code=93) on VRTS host vrts1003.eqiad.wmnet
- 17:02 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet
- 17:01 btullis@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet
- 17:00 urbanecm@deploy2002: Finished scap sync-world: Backport for ReassignMentees: Add additional logging (T376124), ReassignMentees: Add additional logging (T376124) (duration: 14m 42s)
- 16:58 btullis@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet
- 16:56 urbanecm@deploy2002: urbanecm: Continuing with sync
- 16:50 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts alert[1001,2001].wikimedia.org
- 16:50 denisse@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:50 denisse@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: alert[1001,2001].wikimedia.org decommissioned, removing all IPs except the asset tag one - denisse@cumin2002"
- 16:49 denisse@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: alert[1001,2001].wikimedia.org decommissioned, removing all IPs except the asset tag one - denisse@cumin2002"
- 16:48 urbanecm@deploy2002: urbanecm: Backport for ReassignMentees: Add additional logging (T376124), ReassignMentees: Add additional logging (T376124) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 16:46 denisse@cumin2002: START - Cookbook sre.dns.netbox
- 16:46 urbanecm@deploy2002: Started scap sync-world: Backport for ReassignMentees: Add additional logging (T376124), ReassignMentees: Add additional logging (T376124)
- 16:38 denisse@cumin2002: START - Cookbook sre.hosts.decommission for hosts alert[1001,2001].wikimedia.org
- 16:33 taavi: start extensions/GlobalUsage/maintenance/refreshGlobalimagelinks.php on labswiki to backfill global usage information
- 16:31 taavi@deploy2002: Finished scap sync-world: Backport for Add wikitech.wikimedia.org to $wgCrossSiteAJAXdomains, logging: Remove unused global $wmgMonologProcessors, Remove references to removed wikitech.php (duration: 07m 13s)
- 16:31 btullis@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
- 16:27 denisse@cumin2002: START - Cookbook sre.hosts.decommission for hosts alert[1001,2001].wikimedia.org
- 16:27 denisse: Running the sre.hosts.decommission cookbook on the alert1001, and alert2001 hosts - T372607
- 16:27 taavi@deploy2002: matmarex, taavi: Continuing with sync
- 16:26 taavi@deploy2002: matmarex, taavi: Backport for Add wikitech.wikimedia.org to $wgCrossSiteAJAXdomains, logging: Remove unused global $wmgMonologProcessors, Remove references to removed wikitech.php synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 16:24 taavi@deploy2002: Started scap sync-world: Backport for Add wikitech.wikimedia.org to $wgCrossSiteAJAXdomains, logging: Remove unused global $wmgMonologProcessors, Remove references to removed wikitech.php
- 16:16 taavi@deploy2002: Finished scap sync-world: Backport for reverse-proxy: Drop all public ips except cloudweb2002-dev.codfw.wmnet (T292707) (duration: 07m 01s)
- 16:11 taavi@deploy2002: zabe, taavi: Continuing with sync
- 16:11 taavi@deploy2002: zabe, taavi: Backport for reverse-proxy: Drop all public ips except cloudweb2002-dev.codfw.wmnet (T292707) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 16:09 taavi@deploy2002: Started scap sync-world: Backport for reverse-proxy: Drop all public ips except cloudweb2002-dev.codfw.wmnet (T292707)
- 16:03 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views
- 16:03 bking@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host wdqs-categories1001.eqiad.wmnet
- 16:03 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs-categories1001.eqiad.wmnet with OS bullseye
- 15:46 jelto@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
- 15:45 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
- 15:43 cdanis@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 15:43 cdanis@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 15:41 jelto@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
- 15:41 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
- 15:38 cdanis@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 15:38 cdanis@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 15:37 cdanis@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 15:36 cdanis@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 15:36 cdanis@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 15:36 cdanis@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 15:36 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:35 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 15:35 cdanis@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 15:34 cdanis@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 15:33 cdanis@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:33 cdanis@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
- 15:31 cdanis@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 15:31 cdanis@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 15:30 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@3a7901e]: T375153 (duration: 01m 59s)
- 15:28 swfrench@cumin1002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
- 15:28 swfrench@cumin1002: START - Cookbook sre.discovery.datacenter status all services in all: None - None
- 15:28 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@3a7901e]: T375153
- 15:27 swfrench@cumin1002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in eqiad: Datacenter Switchover - T370962
- 15:26 dancy@deploy2002: Finished scap sync-world: Testing T370934 (duration: 03m 19s)
- 15:24 jelto@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
- 15:23 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
- 15:22 dancy@deploy2002: Started scap sync-world: Testing T370934
- 15:18 dancy@deploy2002: Installation of scap version "4.108.0" completed for 210 hosts
- 15:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on registry1004.eqiad.wmnet with reason: testing
- 15:14 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on registry1004.eqiad.wmnet with reason: testing
- 15:13 dancy@deploy2002: Installing scap version "4.108.0" for 210 hosts
- 15:12 cdanis@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 15:12 cdanis@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 15:07 swfrench@cumin1002: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: Datacenter Switchover - T370962
- 15:07 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:04 elukey@cumin1002: START - Cookbook sre.hosts.provision for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:00 swfrench@cumin1002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
- 15:00 swfrench@cumin1002: START - Cookbook sre.discovery.datacenter status all services in all: None - None
- 14:59 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 14:56 elukey@cumin1002: START - Cookbook sre.hosts.provision for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 14:51 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs-categories1001.eqiad.wmnet with OS bullseye
- 14:46 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM wdqs-categories1001.eqiad.wmnet - bking@cumin2002"
- 14:46 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM wdqs-categories1001.eqiad.wmnet - bking@cumin2002"
- 14:45 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs-categories1001.eqiad.wmnet on all recursors
- 14:45 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs-categories1001.eqiad.wmnet on all recursors
- 14:45 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:45 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM wdqs-categories1001.eqiad.wmnet - bking@cumin2002"
- 14:44 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM wdqs-categories1001.eqiad.wmnet - bking@cumin2002"
- 14:40 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc1004.wikimedia.org
- 14:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host irc1004.wikimedia.org with OS bookworm
- 14:30 bking@cumin2002: START - Cookbook sre.dns.netbox
- 14:30 bking@cumin2002: START - Cookbook sre.ganeti.makevm for new host wdqs-categories1001.eqiad.wmnet
- 14:29 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2001.codfw.wmnet with OS bookworm
- 14:26 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc1004.wikimedia.org with reason: host reimage
- 14:22 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc1004.wikimedia.org with reason: host reimage
- 14:21 urbanecm@deploy2002: Finished scap sync-world: Backport for labswiki: Disallow account autocreation (T161859) (duration: 07m 38s)
- 14:17 urbanecm@deploy2002: urbanecm: Continuing with sync
- 14:16 urbanecm@deploy2002: urbanecm: Backport for labswiki: Disallow account autocreation (T161859) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:14 urbanecm@deploy2002: Started scap sync-world: Backport for labswiki: Disallow account autocreation (T161859)
- 14:12 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host irc1004.wikimedia.org with OS bookworm
- 14:11 hashar@deploy2002: Finished scap sync-world: Backport for Remove Maintenance check (T376255) (duration: 07m 27s)
- 14:08 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc1004.wikimedia.org - elukey@cumin1002"
- 14:08 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc1004.wikimedia.org - elukey@cumin1002"
- 14:07 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc1004.wikimedia.org on all recursors
- 14:07 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache irc1004.wikimedia.org on all recursors
- 14:07 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:07 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc1004.wikimedia.org - elukey@cumin1002"
- 14:07 hashar@deploy2002: hashar: Continuing with sync
- 14:06 hashar@deploy2002: hashar: Backport for Remove Maintenance check (T376255) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:06 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc1004.wikimedia.org - elukey@cumin1002"
- 14:04 hashar@deploy2002: Started scap sync-world: Backport for Remove Maintenance check (T376255)
- 14:03 hashar@deploy2002: Sync cancelled.
- 14:03 hashar@deploy2002: hashar: Backport for Remove Maintenance check (T376255) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:03 elukey@cumin1002: START - Cookbook sre.dns.netbox
- 14:03 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host irc1004.wikimedia.org
- 14:01 hashar@deploy2002: Started scap sync-world: Backport for Remove Maintenance check (T376255)
- 13:31 Lucas_WMDE: UTC afternoon backport+config window done
- 13:28 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Improve sub-ref check to avoid false positives (T376242) (duration: 10m 32s)
- 13:24 lucaswerkmeister-wmde@deploy2002: wmde-fisch, lucaswerkmeister-wmde: Continuing with sync
- 13:20 lucaswerkmeister-wmde@deploy2002: wmde-fisch, lucaswerkmeister-wmde: Backport for Improve sub-ref check to avoid false positives (T376242) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:18 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Improve sub-ref check to avoid false positives (T376242)
- 13:17 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [zhwiki] Enable the CampaignEvents extension (T373821) (duration: 14m 45s)
- 13:16 moritzm: upload ircstream 0.13.0~dev+wmf1 to apt.wikimedia.org bookworm/ircstream-sse component (seperate build using the experimental eventstream feature branch of ircstream) T376014
- 13:13 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 13:12 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Continuing with sync
- 13:09 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 13:05 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Backport for [zhwiki] Enable the CampaignEvents extension (T373821) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:02 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [zhwiki] Enable the CampaignEvents extension (T373821)
- 12:59 moritzm: upload python3-aiohttp-sse-client 0.2.1-0 to apt.wikimedia.org bookworm/ircstream-sse component (needed by the eventstream feature branch of ircstream) T376014
- 12:57 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: UEFI test
- 12:57 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: UEFI test
- 12:49 hashar@deploy2002: Finished scap sync-world: Backport for Use wgDonationInterfaceFundraiserMaintenance (T376255) (duration: 07m 01s)
- 12:45 hashar@deploy2002: hashar, zabe: Continuing with sync
- 12:45 hashar@deploy2002: hashar, zabe: Backport for Use wgDonationInterfaceFundraiserMaintenance (T376255) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 12:42 hashar@deploy2002: Started scap sync-world: Backport for Use wgDonationInterfaceFundraiserMaintenance (T376255)
- 12:39 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
- 12:35 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
- 12:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:14 zabe@deploy2002: Finished scap sync-world: Backport for s6: Reduce revision-slots cache expiry to 60s (T183490 T376129) (duration: 08m 50s)
- 12:13 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage2001.codfw.wmnet with OS bookworm
- 12:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:11 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0)
- 12:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:09 zabe@deploy2002: zabe: Continuing with sync
- 12:09 zabe@deploy2002: zabe: Backport for s6: Reduce revision-slots cache expiry to 60s (T183490 T376129) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 12:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:08 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2001.codfw.wmnet
- 12:08 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2001.codfw.wmnet
- 12:08 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2001.codfw.wmnet
- 12:08 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2001.codfw.wmnet
- 12:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:06 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views
- 12:06 btullis@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93)
- 12:06 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views
- 12:05 zabe@deploy2002: Started scap sync-world: Backport for s6: Reduce revision-slots cache expiry to 60s (T183490 T376129)
- 12:04 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:03 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 11:58 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 11:57 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 11:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 11:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 10:57 _joe_: restarted rsyslog on kubernetes1045
- 10:46 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-etcd1005.eqiad.wmnet
- 10:46 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-etcd1005.eqiad.wmnet with OS bullseye
- 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-etcd1005.eqiad.wmnet with reason: host reimage
- 10:27 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-etcd1005.eqiad.wmnet with reason: host reimage
- 10:17 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-etcd1005.eqiad.wmnet with OS bullseye
- 10:13 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd1005.eqiad.wmnet - elukey@cumin1002"
- 10:13 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd1005.eqiad.wmnet - elukey@cumin1002"
- 10:13 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-etcd1005.eqiad.wmnet on all recursors
- 10:13 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-etcd1005.eqiad.wmnet on all recursors
- 10:13 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:13 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd1005.eqiad.wmnet - elukey@cumin1002"
- 10:11 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd1005.eqiad.wmnet - elukey@cumin1002"
- 10:04 elukey@cumin1002: START - Cookbook sre.dns.netbox
- 10:04 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-etcd1005.eqiad.wmnet
- 10:03 elukey@deploy2002: Finished scap sync-world: Backport for Add irc2003 to the irc settings (T376014) (duration: 07m 11s)
- 10:03 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-etcd1004.eqiad.wmnet
- 10:03 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-etcd1004.eqiad.wmnet with OS bullseye
- 09:59 elukey@deploy2002: elukey: Continuing with sync
- 09:58 elukey@deploy2002: elukey: Backport for Add irc2003 to the irc settings (T376014) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 09:56 elukey@deploy2002: Started scap sync-world: Backport for Add irc2003 to the irc settings (T376014)
- 09:54 elukey@deploy2002: Finished scap sync-world: Add irc2003 to the network policies (duration: 02m 15s)
- 09:53 elukey@deploy2002: Started scap sync-world: Add irc2003 to the network policies
- 09:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-etcd1004.eqiad.wmnet with reason: host reimage
- 09:47 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-etcd1004.eqiad.wmnet with reason: host reimage
- 09:44 gmodena@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 09:44 gmodena@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
- 09:43 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 09:43 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
- 09:42 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 09:42 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
- 09:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-etcd1004.eqiad.wmnet with OS bullseye
- 09:31 hashar@deploy2002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to [php-1.43.0-wmf.24]" - T375656
- 09:30 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation/Advancement/Community Growth/Community Resources" "Wikimedia Foundation/Advancement/Community Growth/Community Resources and Partnerships" "Zabe" --reason "per request T376246"
- 09:23 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd1004.eqiad.wmnet - elukey@cumin1002"
- 09:23 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd1004.eqiad.wmnet - elukey@cumin1002"
- 09:22 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-etcd1004.eqiad.wmnet on all recursors
- 09:22 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-etcd1004.eqiad.wmnet on all recursors
- 09:22 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:22 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd1004.eqiad.wmnet - elukey@cumin1002"
- 09:21 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd1004.eqiad.wmnet - elukey@cumin1002"
- 09:17 elukey@cumin1002: START - Cookbook sre.dns.netbox
- 09:17 jynus@cumin1002: dbctl commit (dc=all): 'Set es2024 to weight 10 as the rest of es-rw hosts T376249', diff saved to https://phabricator.wikimedia.org/P69443 and previous config saved to /var/cache/conftool/dbconfig/20241002-091754-jynus.json
- 09:17 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-etcd1004.eqiad.wmnet
- 09:16 elukey@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-ctrl1004.eqiad.wmnet
- 09:16 elukey@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
- 09:16 elukey@cumin1002: START - Cookbook sre.dns.netbox
- 09:16 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-ctrl1004.eqiad.wmnet
- 09:13 vgutierrez: repooling cp3071 and cp3072 after HW maintenance - T374986
- 09:08 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp[3071-3072].esams.wmnet
- 09:08 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for cp[3071-3072].esams.wmnet
- 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2003.wikimedia.org
- 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host aux-k8s-ctrl1001.eqiad.wmnet
- 08:57 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host aux-k8s-ctrl1001.eqiad.wmnet
- 08:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2003.wikimedia.org
- 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host aux-k8s-worker1001.eqiad.wmnet
- 08:55 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host aux-k8s-worker1001.eqiad.wmnet
- 08:55 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@3b76c68]: (no justification provided) (duration: 00m 52s)
- 08:54 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@3b76c68]: (no justification provided)
- 08:36 jayme: removed the label node-role.kubernetes.io/master and the taint node-role.kubernetes.io/master:NoSchedule to all k8s apiservers - T334234
- 08:32 jayme: added the taint node-role.kubernetes.io/control-plane:NoSchedule to all k8s apiservers - T334234
- 08:29 hashar: Restarted stashbot based on instructions at https://wikitech.wikimedia.org/wiki/Tool:Stashbot
- 08:20 hashar@deploy2002: Finished scap sync-world: Backport for Metrics Platform monotable: Base stream configuration (T373967) (duration: 10m 27s)
- 08:16 hashar@deploy2002: hashar, sfaci: Continuing with sync
- 08:12 hashar@deploy2002: hashar, sfaci: Backport for Metrics Platform monotable: Base stream configuration (T373967) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:10 hashar@deploy2002: Started scap sync-world: Backport for Metrics Platform monotable: Base stream configuration (T373967)
- 07:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
- 07:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
- 07:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org
- 07:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org
- 07:09 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp[3071-3072].esams.wmnet with reason: HW maintenance
- 07:09 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp[3071-3072].esams.wmnet with reason: HW maintenance
- 06:50 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AndyRussG out of all services on: 1497 hosts
- 06:49 root@cumin2002: START - Cookbook sre.idm.logout Logging AndyRussG out of all services on: 1497 hosts
- 06:48 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AndyRussG out of all services on: 706 hosts
- 06:48 root@cumin2002: START - Cookbook sre.idm.logout Logging AndyRussG out of all services on: 706 hosts
- 02:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
- 01:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
- 01:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host logging-hd2005.codfw.wmnet with OS bookworm
2024-10-01
- 23:42 zabe: zabe@mwmaint2002:~$ cat /home/zabe/s3.txt | xargs -I{} bash -c "echo {}; mwscript extensions/WikimediaMaintenance/migrateESRefToContentTable.php {} --skip /home/zabe/text_table_cleanup/{} --dump /home/zabe/text_table_dump/{} --sleep 1" # T183490
- 20:34 hashar: UTC late backport window completed
- 20:28 hashar: mwscript purgeList.php --wiki=tlywiki --namespace=4 # T367009
- 20:12 hashar@deploy2002: Finished scap sync-world: Backport for Update wgMetaNamespace for tlywiki (T367009) (duration: 07m 21s)
- 20:07 hashar@deploy2002: nmw03, hashar: Continuing with sync
- 20:06 hashar@deploy2002: nmw03, hashar: Backport for Update wgMetaNamespace for tlywiki (T367009) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:04 hashar@deploy2002: Started scap sync-world: Backport for Update wgMetaNamespace for tlywiki (T367009)
- 20:02 hashar: Restarting CI Jenkins
- 19:48 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 19:47 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 17:59 ladsgroup@deploy2002: Finished scap sync-world: Backport for Allow storing of passwords for local users in wikitech (T376140) (duration: 09m 03s)
- 17:56 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 17:55 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 17:55 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 17:53 ladsgroup@deploy2002: ladsgroup: Backport for Allow storing of passwords for local users in wikitech (T376140) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 17:50 ladsgroup@deploy2002: Started scap sync-world: Backport for Allow storing of passwords for local users in wikitech (T376140)
- 17:50 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
- 16:00 ladsgroup@deploy2002: taavi, ladsgroup: Continuing with sync
- 15:59 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, this test transfer should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
- 15:58 ladsgroup@deploy2002: taavi, ladsgroup: Backport for Make Wikitech behave a bit more like a SUL wiki (T371374) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 15:56 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, this test transfer should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
- 15:55 ladsgroup@deploy2002: Started scap sync-world: Backport for Make Wikitech behave a bit more like a SUL wiki (T371374)
- 15:54 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, this test transfer should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1023.eqiad.wmnet, repooling both afterwards
- 15:54 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, this test transfer should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1023.eqiad.wmnet, repooling both afterwards
- 15:44 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 15:39 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
- 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 15:07 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-ctrl1003.eqiad.wmnet
- 15:07 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker1003.eqiad.wmnet
- 15:05 brennen@deploy2002: Finished deploy [phabricator/deployment@33a2c8d]: deploy phab1004 for T376149 (duration: 01m 07s)
- 15:04 brennen@deploy2002: Started deploy [phabricator/deployment@33a2c8d]: deploy phab1004 for T376149
- 15:03 brennen@deploy2002: Finished deploy [phabricator/deployment@33a2c8d]: test deploy phab2002 for T376149 (duration: 00m 30s)
- 15:03 jelto@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
- 15:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
- 15:03 brennen@deploy2002: Started deploy [phabricator/deployment@33a2c8d]: test deploy phab2002 for T376149
- 15:02 jelto@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
- 15:02 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
- 15:02 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
- 15:01 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
- 15:01 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
- 15:01 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
- 14:45 jayme: added the taint node-role.kubernetes.io/control-plane:NoSchedule to wikikube staging apiservers - T334234
- 14:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 14:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 14:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 14:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 14:15 jayme: added the label node-role.kubernetes.io/control-plane= to all k8s apiservers - T334234
- 14:10 moritzm: installing cups security updates
- 13:49 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=aux-k8s-worker1003.eqiad.wmnet
- 13:49 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=aux-k8s-ctrl1003.eqiad.wmnet
- 13:32 elukey@puppetserver1001: conftool action : set/weight=1; selector: name=aux-k8s-ctrl1003.eqiad.wmnet
- 13:32 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-ctrl1003.eqiad.wmnet
- 13:31 elukey@puppetserver1001: conftool action : set/weight=10; selector: name=aux-k8s-worker1003.eqiad.wmnet
- 13:31 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker1003.eqiad.wmnet
- 13:21 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
- 12:28 ladsgroup@deploy2002: Finished scap sync-world: Backport for wikitech: Allow 'crats to rename local users (T161859) (duration: 07m 51s)
- 12:23 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 12:23 Amir1: mwscript maintenance/storage/moveToExternal.php --wiki=labswiki --undo /home/ladsgroup/T376129.undo.sql DB cluster31 (T376129)
- 12:22 ladsgroup@deploy2002: ladsgroup: Backport for wikitech: Allow 'crats to rename local users (T161859) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 12:20 ladsgroup@deploy2002: Started scap sync-world: Backport for wikitech: Allow 'crats to rename local users (T161859)
- 12:17 ladsgroup@deploy2002: Finished scap sync-world: Backport for Wikitech: Connect wikitech to external storage (T376129) (duration: 09m 53s)
- 12:12 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 12:09 ladsgroup@deploy2002: ladsgroup: Backport for Wikitech: Connect wikitech to external storage (T376129) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 12:07 ladsgroup@deploy2002: Started scap sync-world: Backport for Wikitech: Connect wikitech to external storage (T376129)
- 12:02 ladsgroup@deploy2002: Finished scap sync-world: Backport for wikitech: Soft connect wikitech to SUL (T161859) (duration: 09m 53s)
- 11:57 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 11:54 ladsgroup@deploy2002: ladsgroup: Backport for wikitech: Soft connect wikitech to SUL (T161859) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 11:52 ladsgroup@deploy2002: Started scap sync-world: Backport for wikitech: Soft connect wikitech to SUL (T161859)
- 11:51 stevemunene@cumin1002: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
- 11:49 ladsgroup@deploy2002: Finished scap sync-world: Backport for Drop wikitech.php (T371592 T371374) (duration: 07m 32s)
- 11:45 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 11:44 ladsgroup@deploy2002: ladsgroup: Backport for Drop wikitech.php (T371592 T371374) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 11:42 ladsgroup@deploy2002: Started scap sync-world: Backport for Drop wikitech.php (T371592 T371374)
- 11:28 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc2003.wikimedia.org
- 11:28 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host irc2003.wikimedia.org with OS bookworm
- 11:16 effie: Switching wikitech to k8s - T292707
- 11:12 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc2003.wikimedia.org with reason: host reimage
- 11:09 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc2003.wikimedia.org with reason: host reimage
- 11:01 jiji@deploy2002: Finished scap sync-world: Backport for wikitech: de-wikitech mediawiki-config (T371537 T371592 T371374 T371359) (duration: 08m 23s)
- 10:56 jiji@deploy2002: jiji: Continuing with sync
- 10:55 jiji@deploy2002: jiji: Backport for wikitech: de-wikitech mediawiki-config (T371537 T371592 T371374 T371359) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 10:52 jiji@deploy2002: Started scap sync-world: Backport for wikitech: de-wikitech mediawiki-config (T371537 T371592 T371374 T371359)
- 10:48 jiji@deploy2002: Sync cancelled.
- 10:44 jiji@deploy2002: jiji: Backport for wikitech: de-wikitech mediawiki-config (T371537 T371592 T371374 T371359) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 10:44 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-staging2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:44 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-staging2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:42 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2011.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:42 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-serve2011.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:42 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:42 jiji@deploy2002: Started scap sync-world: Backport for wikitech: de-wikitech mediawiki-config (T371537 T371592 T371374 T371359)
- 10:41 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-serve2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:40 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:38 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host parsoidtest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:38 elukey@cumin2002: START - Cookbook sre.hosts.provision for host parsoidtest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:36 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host deploy1003.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host deploy1003.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:35 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host krb1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:35 elukey@cumin2002: START - Cookbook sre.hosts.provision for host krb1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:33 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:33 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:26 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2008.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:26 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy2008.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:25 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2007.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:25 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy2007.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:24 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:23 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:23 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1029.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:21 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy1029.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:17 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:17 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host irc2003.wikimedia.org with OS bookworm
- 10:15 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:15 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc2003.wikimedia.org - elukey@cumin1002"
- 10:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc2003.wikimedia.org - elukey@cumin1002"
- 10:15 elukey@cumin2002: START - Cookbook sre.hosts.provision for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:15 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc2003.wikimedia.org on all recursors
- 10:15 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache irc2003.wikimedia.org on all recursors
- 10:15 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:15 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2003.wikimedia.org - elukey@cumin1002"
- 10:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2003.wikimedia.org - elukey@cumin1002"
- 10:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1005.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host an-conf1005.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:11 elukey@cumin1002: START - Cookbook sre.dns.netbox
- 10:11 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host irc2003.wikimedia.org
- 10:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:06 elukey@cumin2002: START - Cookbook sre.hosts.provision for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org
- 10:02 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:01 elukey@cumin2002: START - Cookbook sre.hosts.provision for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org
- 09:59 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 09:57 elukey@cumin2002: START - Cookbook sre.hosts.provision for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 09:24 jmm@deploy2002: Finished scap sync-world: Backport for Remove irc1001/irc2001 from mediawiki-config and add irc1003 (T331702 T376014) (duration: 08m 07s)
- 09:19 jmm@deploy2002: jmm: Continuing with sync
- 09:19 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: cloudvirt1063 needs maintenance T375223
- 09:18 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: cloudvirt1063 needs maintenance T375223
- 09:18 jmm@deploy2002: jmm: Backport for Remove irc1001/irc2001 from mediawiki-config and add irc1003 (T331702 T376014) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 09:16 jmm@deploy2002: Started scap sync-world: Backport for Remove irc1001/irc2001 from mediawiki-config and add irc1003 (T331702 T376014)
- 09:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T367856)', diff saved to https://phabricator.wikimedia.org/P69437 and previous config saved to /var/cache/conftool/dbconfig/20241001-090708-ladsgroup.json
- 09:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
- 09:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
- 09:06 ladsgroup@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
- 09:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
- 09:06 ladsgroup@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
- 09:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
- 08:58 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.25 refs T375656
- 08:46 urbanecm@deploy2002: Finished scap sync-world: Backport for DatabaseMentorStore: Cast user IDs to integers before looking them up (T375784) (duration: 06m 58s)
- 08:39 urbanecm@deploy2002: Started scap sync-world: Backport for DatabaseMentorStore: Cast user IDs to integers before looking them up (T375784)
- 07:58 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T375382
- 07:54 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T375382
- 07:43 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: T374215
- 07:39 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: T374215
- 07:34 kartik@deploy2002: Finished scap sync-world: Backport for Add namespace aliases for scn.wikipedia (T375979) (duration: 10m 05s)
- 07:30 kartik@deploy2002: kartik, melos: Continuing with sync
- 07:26 kartik@deploy2002: kartik, melos: Backport for Add namespace aliases for scn.wikipedia (T375979) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:24 kartik@deploy2002: Started scap sync-world: Backport for Add namespace aliases for scn.wikipedia (T375979)
- 07:21 kartik@deploy2002: Finished scap sync-world: Backport for Enable translation settings banner for Test wikipedia (T372460) (duration: 18m 15s)
- 07:14 kartik@deploy2002: kartik, abi: Continuing with sync
- 07:09 kartik@deploy2002: kartik, abi: Backport for Enable translation settings banner for Test wikipedia (T372460) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:03 kartik@deploy2002: Started scap sync-world: Backport for Enable translation settings banner for Test wikipedia (T372460)
- 06:47 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Luke Bowmaker out of all services on: 705 hosts
- 06:47 root@cumin2002: START - Cookbook sre.idm.logout Logging Luke Bowmaker out of all services on: 705 hosts
- 06:47 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Luke Bowmaker out of all services on: 1497 hosts
- 06:46 root@cumin2002: START - Cookbook sre.idm.logout Logging Luke Bowmaker out of all services on: 1497 hosts
- 06:44 XioNoX: cr3-ulsfo> request vmhost snapshot - T375345
- 04:01 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.22 (duration: 00m 58s)
- 03:51 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.43.0-wmf.25 refs T375656 (duration: 48m 36s)
- 03:02 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.43.0-wmf.25 refs T375656
- 02:47 eileen: civicrm upgraded from cf27c789 to 28fd5e3b
- 02:17 ejegg: email preference center upgraded from 8ff002ef to e88750e6
- 02:16 ejegg: payments-wiki upgraded from 8d3b8e94 to e88750e6