Ops notifications

From Wikitech
Jump to navigation Jump to search

The operations team provides notification of maintenance or upgrades being done, emergency outages, and day to day work. Below is a summary of the different means in use by ops.

IRC

Primary channel -- #wikimedia-operations connect
Use: day to day work and discussions go on here. Updates to the sysadmin log are done from here.
Secondary channel -- #wikimedia-tech connect
Use: during an outage, someone is usually monitoring and providing updates to this channel, while watching for reports of problems.
Other
For sensitive matters that involve possible security or privacy issues, discussion is carried out elsewhere, for obvious reasons.

Server Admin Log (SAL)

Use: powercycling machines, restart of services, upgrades or configuration changes etc. are logged to the Server Admin Log. During long outages brief status updates will be provided here as well.

Twitter

Everything posted to the Server Admin Log by stashbot is also automatically mirrored to the Twitter account @wikimediatech.

Wiki

Outage post-mortems are published on-wiki as incident reports.

Phabricator

When bugs involving operational issues are opened, followup will be done here (publicly available at Phabricator)

Email

There is no single public email mechanism currently in use for updates during site outages or after recovery. (Should there be, or do the above cover us?)

Gerrit

You can look at Gerrit changes and see what has been merged in ops-related projects, f.e. operations/puppet and other operations/* repositories.