Incidents/20160924-ORES
Appearance
(Redirected from Incident documentation/20160924-ORES)
Summary
ORES review tool (= ORES extension) couldn't score edits made in 14 hours between rolling out of wmf.20 and the fast fix made in 2016-09-23
Timeline
This is a step by step outline of what happened to cause the incident and how it was remedied.
- (2016-09-22) SAL: 20:00 thcipriani: rolling out wmf.20 to all wikis
- (2016-09-23) 9:44 The phab task is created
- 9:45 The gerrit patch is made to fix it in master
- 9:47 The patch is merged.
- 9:48 The backport to wmf.20 is made.
- 9:51 The backport is merged
- SAL: 09:58 logmsgbot: hashar@tin Synchronized php-1.28.0-wmf.20/extensions/ORES/includes/Cache.php: No int typehinting (causes jobs to crash) T146461 (duration: 00m 42s)
- SAL: 10:00 Amir1: ladsgroup@terbium:~$ mwscript extensions/ORES/maintenance/PopulateDatabase.php --wiki=enwiki
- SAL: 10:05 Amir1: ladsgroup@terbium:~$ mwscript extensions/ORES/maintenance/PopulateDatabase.php --wiki=wikidatawiki (T146461) and for 'trwiki', 'plwiki', 'fawiki', 'nlwiki', 'ruwiki', 'ptwiki'
Conclusions
- There should be an alarm to scream when jobs such as ORESFetchScoreJob is not triggered for more than an hour.
- The lapse was easy to notice, ORES extension should have extensive CI tests.
Actionables
- Extensive CI tests for ORES extension (task T146560)
- High failure rate of account creation should trigger an alarm / page people (task T146090)