Incidents/2018-03-02 Train

From Wikitech

Summary

Train for 1.31.0-wmf.23 was rolled back on two occasions:

  • 2018-02-28 05:43:37 due to deletion logs on MediaWiki recording incorrect users doing deletions task T188479
  • 2018-02-28 22:11:xx due to a noisy notices/all pages listed in Special:Newpages showed the current date and time task T188555

Timeline

T188479

  • 05:21 Stemoc reported a problem with the deletion logs (Special:Log/delete) showing the wrong user in #mediawiki
  • legoktm investigates and creates a task: 05:29, 28 February 2018 Dharmadeepa V (talk | contribs | block) deleted page User:Dharmadeepa V (spam (this is legoktm)) (view/restore)
  • 2018-02-28 05:39:05 <legoktm> I'd suggest reverting the train, like immediately
  • 05:43:37 +logmsgbot | !log demon@tin rebuilt and synchronized wikiversions files: (no justification provided)

T188555

  • 1.31.0-wmf.23 was rolled out to group1 wikis: 2018-02-28 21:56 <thcipriani@tin> rebuilt and synchronized wikiversions files: Group1 to 1.31.0-wmf.23
  • Wed, Feb 28, 21:59 thcipriani noticed an increased error rate, notices pointing to stdclass::$rc_timestamp and created task T188555
  • thcipriani rolled group back to 1.31.0-wmf.22 2018-02-28 22:11 <thcipriani@tin> rebuilt and synchronized wikiversions files: Group1 back to 1.31.0-wmf.22 T188555
  • Overnight the problem was resolved and a patch merged in master, that change was deployed after train the following day: [2018-03-01T20:15:45Z] <thcipriani@tin> Synchronized php-1.31.0-wmf.23/includes/specials/pagers/NewPagesPager.php: SWAT: NewPagesPages: Use array_merge rather than + for RC query info fields T188555 (duration: 01m 14s)

Conclusions

  • A test case probably should have caught the first problem that led to an emergency rollback.
  • The second problem seems like something automated browser tests or manual testing could have caught.

Actionables

  • phab:T188773 - Test to validate deletion log entries and ArticleDeleteComplete hook performers