For about 20 hours on the 28th and 29th of July, http->https redirects from varnish were disabled due to an error in the redirect regex. The error was deployed shortly before a MediaWiki config change (to set $wgSecureLogin = false) which disabled the application-level logic to check for http and redirect to https in security critical situations.
Since Aug 2013, we have redirected users to https for security-critical functions. Because we have done this for our users for the past two years without incident, it's likely that user no longer vigilantly check for https before entering their password; during this incident we saw 14,567 logins over http (cleartext).
Although the likelihood that an attacker would be able to exploit this issue is low, the impact of a stolen password for an account in any sysop or checkuser groups could have a high impact on the site's security or user privacy, especially since we do not currently provide any means of 2-factor authentication to our users. Due to WMF's strong privacy stance, we do not log which user accounts are logging in at any given time in a way that can be easily correlated with webrequest logs. As a result, we were only able to correlate about 5,000 of the cleartext logins with wiki user accounts. Of the identified accounts, 12 were members of various sysop groups. Those users were notified of the incident with a recommendation that they change their password.
During this incident, only 23% of logins were over HTTP. We believe that a combination of HSTS and canonical https links prevented most users from using HTTP to login.
- 16:54 UTC Jul 28 - https://gerrit.wikimedia.org/r/#/c/227455/ deployed, breaking redirect regex in varnish
- 18:56 UTC Jul 28 - https://gerrit.wikimedia.org/r/#/c/219265/ deployed, setting $wgSecureLogin=false, application layer logic stops redirecting
- 15:14 UTC Jul 29 - email to firstname.lastname@example.org notifying about lack of redirects
- 15:29 UTC Jul 29 - https://gerrit.wikimedia.org/r/#/c/227729/ was merged, fixing the regex
- 16:09 UTC Jul 29 - https://gerrit.wikimedia.org/r/#/c/227740 was merged, re-adding application-level detection and redirection
- 06:41 UTC Jul 30 - email to identified, affected sysop accounts, notifying of incident
In setting $wgSecureLogin to false on WMF sites, two security principles were in conflict-- keeping our security-critical codebase small and simple vs. having overlapping controls for security-critical functions in case one fails. Keeping $wgSecureLogin enabled caused a non-trivial amount of code to be run on every web request. Disabling the option simplifies a significant portion of the user login process. However, in light of this incident, we should have invested in simplifying the login process while maintaining the overlapping control.
The high number of users logging in over http also validates our concern with preventing ssl-stripping attacks. Ops should continue their work implementing HSTS, and getting it preloaded.