Switch Datacenter/Coordination
Planning and executing a DC switchover in a non-emergency requires coordinating between various SRE subteams, RelEng, CommRel and others. While we aim to make this a non-event from a user perspective, we're not there yet from an operational perspective.
Who is involved?
- CommRel handles the on-wiki communications
- SRE handles the mailing lists, slack, and IRC announcements
Scheduling
Scheduling is now fixed. See Switch Datacenter/Recurring, Equinox-based, Data Center Switchovers for the policy change and Switch Datacenter/Switchover Dates for the pre-calculated dates up to 2050.
Ensure both switchovers of the current year are present in Deployments/Yearly_calendar. Note that this does not ensure inclusion in the weekly Deployment calendar, which must be done manually (see below).
Tasks
- Create a Phabricator task (e.g. T357547) and update the Switch Datacenter page with the schedule (use zonestamp links for convenience).
- Announce dates on chats: #engineering-all (slack), #product-tech-dept (slack), #wikimedia-sre (IRC), (see Switch Datacenter/Coordination#Notes)
- Create a CommRel Phabricator task (see Switch Datacenter/Coordination#Notes)
- Announce dates on the following lists (see Switch Datacenter/Coordination#Notes):
- Ask for permission from ITS via their internal email to post an announcement on the #global-announce slack channel
- Send calendar invitations to sre@wikimedia.org
- Add the date and times in the SRE Monday Update under the Service Interruptions - Any other maintenance and expansions? heading
- Make sure the Switchover is listed in the Deployment calendar (example diffs: traffic and services, MediaWiki).
- Coordinate with Volans on ensuring any spicerack/wmflib releases are done before they're needed
Notes
Please take the time to bold the important stuff in all communications, i.e. dates and DCs. If you need to adjust the default 14:00 UTC target time for this switchover, please take care to verify all times are updated in the communications you send based on these templates. Note: MoveComms will possibly need to update various translations of the note, a kind reminder might turn out to be helpful
- Chats: adjust the dates, Phab tasks and DCs accordingly
Dear engineers, We are about a month away from our (now standardised) DC switchover, as reflected in the https://wikitech.wikimedia.org/wiki/Deployments/Yearly_calendar. Important Dates * **Services:** [[ https://zonestamp.toolforge.org/1710856800%7C Tuesday, 19 March 2024 @14:00 UTC ]] * **Traffic:** [[ https://zonestamp.toolforge.org/1710856800 | Tuesday, 19 March 2024 @14:00 UTC ]] * **MediaWiki:** [[ https://zonestamp.toolforge.org/1710943200%7C Wednesday, 20 March 2024 @14:00 UTC ]] * **codfw repool:** [[ https://zonestamp.toolforge.org/1711548000 | Thursday, 27th March 2024 @14:00 UTC ]] If you have any related work, please file your tasks under T357547
- MoveComms Phabricator task: adjust the dates, weeks, and DCs accordingly, and tag
#MoveComms-Support
(tagging should be sufficient for them to pick it up)
MoveComms support for Northward Datacentre Switchover (March 2024)
Dear MoveComms, We are planning a datacentre switchover for the week of March 18th (week 12) with the following schedule: - **Services:** [[ https://zonestamp.toolforge.org/1710856800%7C Tuesday, 19 March 2024 @14:00 UTC ]] - **Traffic:** [[ https://zonestamp.toolforge.org/1710856800 | Tuesday, 19 March 2024 @14:00 UTC ]] - **MediaWiki:** [[ https://zonestamp.toolforge.org/1710943200%7C Wednesday, 20 March 2024 @14:00 UTC ]] The expected impact is 2-3 minutes of read-only on Wednesday, 20 March 2024 @ 14:00 UTC. We continue to follow the process described in [[ https://wikitech.wikimedia.org/wiki/Switch_Datacenter/Recurring,_Equinox-based,_Data_Center_Switchovers | Recurring, Equinox-based, Data Center Switchovers ]], in particular: - There is no switchback! We are staying in **eqiad** until the next switchover. - Future switchovers are predictable and take place every 6 months; always on the week of an equinox. Let #serviceops know if you need more info on the changes. Thank you!
- Mailing Lists: adjust the dates, Phab tasks and DCs accordingly
Northward Datacentre Switchover (March 2024)
Dear all, On Wednesday March 20th 2024, the SRE team will run a planned datacentre switchover, moving all wikis from codfw to eqiad. This is an important periodic test of our tools and procedures, to ensure the wikis will continue to be available even in the event of major technical issues in our primary home. It also gives all our SRE and ops teams a chance to do maintenance and upgrades on systems in codfw that normally run 24 hours a day. The switchover process requires a brief read-only period for all Foundation-hosted wikis, which will start on Wednesday March 20th 2024 @ 14:00 UTC, and will last for just a few minutes while we execute the migration as efficiently as possible. All our public and private wikis will be continuously available for reading, as usual, but editing will be unavailable during the process. Users will see a notification of the upcoming maintenance, and anyone still editing will be asked to try again in a few minutes. CommRel will soon begin notifying communities of the read-only window. If you like, you can follow along on the day in the public #wikimedia-operations channel on IRC. To report any issues, you can reach us in #wikimedia-sre on IRC, or file a Phabricator ticket with the #datacenter-switchover tag (https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=Datacenter-Switchover; we'll be monitoring closely for reports of trouble during and after the switchover. The switchover and its preparation will be tracked under https://phabricator.wikimedia.org/T357547. On behalf of the SRE team, please excuse the disruption, and we would like to thank everyone in various departments who are involved in planning this work. If you have any questions, please reply directly to this email. Kind Regards, An Engineer