Switch Datacenter/Coordination
Jump to navigation
Jump to search
Planning and executing a DC switchover in a non-emergency requires coordinating between various SRE subteams, RelEng, CommRel and others. While we aim to make this a non-event from a user perspective, we're not there yet from an operational perspective.
Scheduling
Ideally this should be started 2 months before the desired date.
- Check the WMF Staff Calendar, global holidays and the deployment yearly calendar for potential conflicts.
- Ask the DBA, DCOps, RelEng, Network Engineering in Infrastructure Foundations and CommRel teams to verify the date works with them.
- Do this scheduling a kickoff meeting including representatives from the affected teams, where a range of dates can be proposed for the switchover and the switchback. Followup with them and set a final date the next week.
- CommRel handles the on-wiki communications, you handle the mailing lists and slack announcements
- Create a Phabricator task (e.g. T281515) and update the Switch Datacenter page with the schedule (use zonestamp links for convenience).
- Typically: Services Tuesday 14:00 UTC, Traffic Tuesday 15:00 UTC, MediaWiki Wednesday 14:00 UTC
- Same for the repool: Services Tuesday 14:00 UTC, Traffic Tuesday 15:00 UTC, MediaWiki Wednesday 14:00 UTC
- Typically 6+ weeks later
- Announce to sre-at-large@wikimedia.org as a tentative date and invite comments and concerns, allow for 1 week of comments
- Announce dates on ops and engineering-all@wikimedia.org, as well as the #engineering-all slack channel, when the date is set.
- Ask for permission from ITS via their internal email to post an announcement on the #global-announce slack channel
- Send calendar invitations to sre@wikimedia.org
- Add the date and times in the SRE Monday Update under the Service Interruptions - Any other maintenance and expansions? heading
- Once the week is listed on the Deployment calendar, add the events there (example) and mark the surrounding deployment windows as canceled.
2 weeks before the selected date:
- Announce dates on wikitech-l mailing lists and #general slack channel.
- Coordinate with Volans on ensuring any spicerack/wmflib releases are done before they're needed
1 week before the selected date:
- Send a reminder to ops and engineering-all, as well as #engineering-all slack channel.