Perhaps it wasn't the index at all
Jynus said on IRC:
[11:09] <jynus> honestly, people are focusing on the index [11:09] <jynus> and I don't think that is the problem [11:09] <jynus> and I don't mean that figuratively [11:09] <jynus> like, "it is the process" [11:09] <jynus> lots of changes where ongoing on that table [11:10] <jynus> and the query optimizer may have decided to change its plan [11:10] <jynus> literally one table column was being blanked [11:10] <jynus> and the query contains like 'b%' [11:11] <jynus> which is a #1 case of "data changes changing the query plan"
This would explain why the problem didn't hit immediately after removing the index, and the index seemed unused. Perhaps it was unused. We were just blanking more and more rows, changing the cardinality of an index until the query plan tipped over. -- daniel (talk) 09:28, 25 May 2018 (UTC)
Please improve documentation for database monitoring
I asked on IRC how devs can check if and how an index is used. Jygnus pointed me to . To avoid issues like this in the future, it would be great if people who search for "unused index" or "index usage" or "database profiling" on wikitech could find some info on performance_schema. -- daniel (talk) 09:31, 25 May 2018 (UTC)
Update: it seems that the relevant "sys" schema is only available to the "wikiadmin" group. Access should either be granted to all deployers, or a process should be installed to request queries against that schema. -- daniel (talk) 11:32, 25 May 2018 (UTC)