Incidents/20161202-20161201-PageImages
Appearance
Summary
A change went live on the train that was supposed to update the page images API to provide an additional api parameter that would allow you to select a page image depending on license type. The current behaviour was that the page images API would return images that were free. The new parameter would allow API callers to also explicitly include images that were non-free. Due to an issue in implementation, the new change reset all known free page images and as a result for most API requests the page images API would return no image. This caused images that previously were appearing in many of Wikimedias projects including our Wikipedia apps, the Wikimedia portal and mobile web search results to disappear for about 3 hours.
Timeline
- @Deskana commented that images had disappeared from the portal. Asked @ebernhardson and @MaxSem to drop everything and fix it.
- @MaxSem tracked down the offending change
- @ebernhardson pinged reading web team (Baha and Jon R) to make them aware of his proposed fix
- @jdlrobson, @ebernhardson and @MaxSem chatted and agreed patch to enable `any` image as default was best short term solution and patch was merged
- @ebernhardson SWATed the fix
- Discovery team enabled a script to re-populate all images on the script. It is estimated that it will take 3 weeks to run. In the meantime API subscribers will be returned any image by default in all Wikimedia projects (which may include non-free images). If they request free images they will only get free images, but there will be many missing images until the maintenance script has completed. Requests for an image with license type `any` may also return an inferior result compared to what images are available.
Conclusions
- Had this change been enabled explicitly by a config changes it would have minimised the time frame from this change going live and being detected and minimised disruption. We should consider flagging big changes like this in future.
- Changes to the PageImages extension should probably be run via the discovery team in future and have been subjected to more thorough review.
- These kinds of big changes should have been thoroughly tested on beta cluster before being enabled in production servers.