Switch Datacenter/Coordination

From Wikitech
Jump to navigation Jump to search

Planning and executing a DC switchover in a non-emergency requires coordinating between various SRE subteams, RelEng, CommRel and others. While we aim to make this a non-event from a user perspective, we're not there yet from an operational perspective.

Scheduling

Ideally this should be started 2 months before the desired date.

  • Check the WMF Staff Calendar, global holidays and the deployment yearly calendar for potential conflicts.
  • Ask the DBA, DCOps, RelEng, Network Engineering in Infrastructure Foundations and CommRel teams to verify the date works with them.
    • Do this scheduling a kickoff meeting including representatives from the affected teams, where a range of dates can be proposed for the switchover and the switchback. Followup with them and set a final date the next week.
    • CommRel handles the on-wiki communications, you handle the mailing lists and slack announcements
  • Create a Phabricator task (e.g. T281515) and update the Switch Datacenter page with the schedule (use zonestamp links for convenience).
    • Typically: Services Tuesday 14:00 UTC, Traffic Tuesday 15:00 UTC, MediaWiki Wednesday 14:00 UTC
    • Same for the repool: Services Tuesday 14:00 UTC, Traffic Tuesday 15:00 UTC, MediaWiki Wednesday 14:00 UTC
      • Typically 6+ weeks later
  • Announce to sre-at-large@wikimedia.org as a tentative date and invite comments and concerns, allow for 1 week of comments
  • Announce dates on ops and engineering-all@wikimedia.org, as well as the #engineering-all slack channel, when the date is set.
  • Ask for permission from ITS via their internal email to post an announcement on the #global-announce slack channel
  • Send calendar invitations to sre@wikimedia.org
  • Add the date and times in the SRE Monday Update under the Service Interruptions - Any other maintenance and expansions? heading
  • Once the week is listed on the Deployment calendar, add the events there (example) and mark the surrounding deployment windows as canceled.

2 weeks before the selected date:

  • Announce dates on wikitech-l mailing lists and #general slack channel.
  • Coordinate with Volans on ensuring any spicerack/wmflib releases are done before they're needed

1 week before the selected date: