Ops notifications

From Wikitech

The operations team provides notification of maintenance or upgrades being done, emergency outages, and day to day work. Below is a summary of the different means in use by ops.


Primary channel -- #wikimedia-operations connect
Use: day to day work and discussions go on here. Updates to the sysadmin log are done from here.
Secondary channel -- #wikimedia-tech connect
Use: during an outage, someone is usually monitoring and providing updates to this channel, while watching for reports of problems.
For sensitive matters that involve possible security or privacy issues, discussion is carried out elsewhere, for obvious reasons.

Server Admin Log (SAL)

Use: powercycling machines, restart of services, upgrades or configuration changes etc. are logged to the Server Admin Log. During long outages brief status updates will be provided here as well.


Everything posted to the Server Admin Log by stashbot is also automatically mirrored to the Twitter account @wikimedia_sal.


Outage post-mortems are published on-wiki as incident reports.


When bugs involving operational issues are opened, followup will be done here (publicly available at Phabricator)


There is no single public email mechanism currently in use for updates during site outages or after recovery. (Should there be, or do the above cover us?)


You can look at Gerrit changes and see what has been merged in ops-related projects, f.e. operations/puppet and other operations/* repositories.