Switch Datacenter/Coordination

From Wikitech

Planning and executing a DC switchover in a non-emergency requires coordinating between various SRE subteams, RelEng, CommRel and others. While we aim to make this a non-event from a user perspective, we're not there yet from an operational perspective.

Who is involved?

  • CommRel handles the on-wiki communications
  • SRE handles the mailing lists, slack, and IRC announcements

Scheduling

Scheduling is now fixed. See Switch Datacenter/Recurring, Equinox-based, Data Center Switchovers for the policy change and Switch Datacenter/Switchover Dates for the pre-calculated dates up to 2050.


Ensure both switchovers of the current year are present in Deployments/Yearly_calendar

Tasks

Notes

Please take the time to bold the important stuff in all communications, i.e. dates and DCs.

  • Chats: adjust the dates, Phab tasks and DCs accordingly
Dear engineers, 

We are about a month away from our (now standardised) DC switchover, as reflected in the https://wikitech.wikimedia.org/wiki/Deployments/Yearly_calendar. 

Important Dates
* **Services:** [[ https://zonestamp.toolforge.org/1710856800%7C Tuesday, 19 March 2024 @14:00 UTC  ]]
* **Traffic:** [[ https://zonestamp.toolforge.org/1710856800 | Tuesday, 19 March 2024 @14:00 UTC  ]]
* **MediaWiki:** [[ https://zonestamp.toolforge.org/1710943200%7C Wednesday, 20 March 2024 @14:00 UTC ]]

* **codfw repool:** [[ https://zonestamp.toolforge.org/1711548000 | Thursday, 27th March 2024 @14:00 UTC ]]


If you have any related work, please file your tasks under T357547
  • CommRel Phabricator task: adjust the dates, weeks, and DCs accordingly, and tag #CommRel-Specialists-Support
CommRel support for Northward Datacentre Switchover (March 2024)

Dear CommRel,

We are planning a datacentre switchover for the week of March 
18th (week 12) with the following schedule:

- **Services:** [[ https://zonestamp.toolforge.org/1710856800%7C Tuesday, 19 March 2024 @14:00 UTC  ]]
- **Traffic:** [[ https://zonestamp.toolforge.org/1710856800 | Tuesday, 19 March 2024 @14:00 UTC  ]]
- **MediaWiki:** [[ https://zonestamp.toolforge.org/1710943200%7C Wednesday, 20 March 2024 @14:00 UTC ]]

The expected impact is 2-3 minutes of read-only on Wednesday, 20 March 2024 @ 14:00 UTC.

Note that we are implementing the changes described in [[ https://wikitech.wikimedia.org/wiki/Switch_Datacenter/Recurring,_Equinox-based,_Data_Center_Switchovers | Recurring, Equinox-based, Data Center Switchovers ]], in particular:

- There is no switchback! we are staying in **eqiad** until the next switchover.
- Future switchovers are predictable and take place every 6 months; always on the week of an equinox.

Let #serviceops know if you need more info on the changes.

Thank you!
  • Mailing Lists: adjust the dates, Phab tasks and DCs accordingly
Northward Datacentre Switchover (March 2024)
Dear all,

On Wednesday March 20th 2024, the SRE team will run a planned datacentre switchover, moving all wikis from codfw to eqiad. This is an important periodic test of our tools and procedures, to ensure the wikis will continue to be available even in the event of major technical issues in our primary home. It also gives all our SRE and ops teams a chance to do maintenance and upgrades on systems in codfw that normally run 24 hours a day.

The switchover process requires a brief read-only period for all Foundation-hosted wikis, which will start on Wednesday March 20th 2024 @ 14:00 UTC, and will last for just a few minutes while we execute the migration as efficiently as possible. All our public and private wikis will be continuously available for reading, as usual, but editing will be unavailable during the process. Users will see a notification of the upcoming maintenance, and anyone still editing will be asked to try again in a few minutes.  

CommRel will soon begin notifying communities of the read-only window.

If you like, you can follow along on the day in the public #wikimedia-operations channel on IRC. To report any issues, you can reach us in #wikimedia-sre on IRC, or file a Phabricator ticket with the #datacenter-switchover tag (https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=Datacenter-Switchover; we'll be monitoring closely for reports of trouble during and after the switchover. The switchover and its preparation will be tracked under https://phabricator.wikimedia.org/T357547.

On behalf of the SRE team, please excuse the disruption, and we would like to thank everyone in various departments who are involved in planning this work. If you have any questions, please reply directly to this email.

Kind Regards,

An Engineer