Ops notifications
The operations team provides notification of maintenance or upgrades being done, emergency outages, and day to day work. Below is a summary of the different means in use by ops.
IRC
- Primary channel -- #wikimedia-operations connect
- Use: day to day work and discussions go on here. Updates to the sysadmin log are done from here.
- Secondary channel -- #wikimedia-tech connect
- Use: during an outage, someone is usually monitoring and providing updates to this channel, while watching for reports of problems.
- Other
- For sensitive matters that involve possible security or privacy issues, discussion is carried out elsewhere, for obvious reasons.
Server Admin Log (SAL)
Use: powercycling machines, restart of services, upgrades or configuration changes etc. are logged to the Server Admin Log. During long outages brief status updates will be provided here as well.
Everything posted to the Server Admin Log by stashbot is also automatically mirrored to the Twitter account @wikimedia_sal.
Mastodon
Likewise, Server Admin Log entries are also mirrored to the Mastodon account @wikimedia_sal@botsin.space.
Wiki
Outage post-mortems are published on-wiki as incident reports.
Phabricator
When bugs involving operational issues are opened, followup will be done here (publicly available at Phabricator)
There is no single public email mechanism currently in use for updates during site outages or after recovery. (Should there be, or do the above cover us?)
Gerrit
You can look at Gerrit changes and see what has been merged in ops-related projects, f.e. operations/puppet and other operations/* repositories.