Incident documentation/2021-04-15 appserver latency

From Wikitech
Jump to navigation Jump to search

document status: draft

Summary

From 07:31 to 08:16 UTC, there was increased latency and error rates for some MediaWiki cache misses and authenticated requests. The incident predominantly affected API users and bots on commons.wikimedia.org. API request latency for some requests went up by 5 seconds, and error rates upto 25%.

The issue was found to be caused by a bot causing increased load on the commonswiki databases, and consuming API webserver resources. Later in the time range, regular users and other wikis were affected as well through cross-wiki features involving Commons.

Actionables

Tracking task: https://phabricator.wikimedia.org/T280232