Incident documentation/20170126-API Slowdown

From Wikitech
Jump to: navigation, search
Very WIP- still under heavy research

On the 2017-01-26, from 17:51 to 18:15 (all times UTC) there was a slowdown/increase in 500 responses on Wikimedia wikis' Mediawiki Action API. While there was scheduled maintenance at the time, no user impact should have been seen, the underlying cause is still being researched.

Summary

Timeline

  • 17:46 paravoid: stopping pybal on lvs1001/lvs1002/lvs1003
  • 17:51 paravoid: replacing asw-c2-eqiad
  • 17:57 elukey: boostrapping aqs1007-a cassandra instance
  • 18:14 paravoid: rebooting newly provisioned asw-c2-eqiad to enable mixed mode
  • 18:15 jynus@tin: Synchronized wmf-config/db-eqiad.php: Depool db1055, 56, 57, 59 (duration: 00m 54s)
  • 18:32 paravoid: starting pybal on lvs1001/lvs1002/lvs1003

Conclusions

More research is needed to understand why the issue happened and how mediawiki model works, and if it has a bug for this particular scenario.

Actionables