Jump to content

Wikimedia Cloud Services team/EnhancementProposals/Decision record T382607 Who runs wikireplicas cookbooks

From Wikitech

Origin task: phab:T382607

Date of the decision: 2025-05-05

People in the decision meeting (alphabetical order):

No meeting, consensus was reached in the Phabricator task.

Decision taken

Option 4 was chosen.

Rationale

Option 4 removes dependencies between teams and allows both WMCS and Data Platform SRE to run the cookbooks at their preferred time without impacting the other team. The risk of misalignment is kept in check by adding an alert (as discussed in the Phabricator task).

Problem

There are two maintenance tasks that are frequently required for Wiki Replicas:

  • running the sre.wikireplicas.add-wiki cookbook (when a new wiki is created, see docs)
  • running the sre.wikireplicas.update-views cookbook (when the view definitions are updated, see docs)

These tasks don't have a clear process around them, so sometimes they wait for weeks or months before somebody notices they need doing. In December 2024, this was discussed between User:MArostegui_(WMF) and User:FNegri-WMF, and User:FNegri-WMF volunteered to take responsibility for those, but we should establish a process that does not rely on a single person.

An additional thing to consider is that both cookbooks at the moment apply changes to clouddb* hosts (managed by cloud-services-team) but also to the an-redacteddb* host (managed by Data-Platform-SRE).

Constraints and risks

  • running these tasks should not require any work from Data-Persistence
  • in the WMCS team only User:FNegri-WMF at the moment knows the details of how these cookbooks work, the issues that can occur while running them, how to run the cookbook steps manually if required.
  • there is no clear "inbox" for requests to run the cookbooks, and running them is generally one step in a larger task. creating such "inbox" is not in scope for this decision request, but we should consider it after this task is resolved.

Options considered

Option 1 (status quo)

User:FNegri-WMF will run the cookbooks. When he's not around, someone from Data-Platform-SRE will have to step in.

Pros:

  • No additional effort required from the WMCS team

Cons:

  • Relies on a single person
  • No knowledge sharing
  • Could cause delays when User:FNegri-WMF is not available

Option 2

The WMCS team member who is on clinic duty runs the cookbooks.

Only in case of issues, they reach out to User:FNegri-WMF or if he's not available, to Data-Platform-SRE.

Pros:

  • Follows an established team process

Cons:

  • Coordination needed with Data-Platform-SRE because the cookbook also updates the an-redacteddb1001 host.

Option 3

We ask Data-Platform-SRE to take full responsibility for running those cookbooks. Only in case of issues, they reach out to cloud-services-team.

Pros:

  • This somewhat matches what was proposed in this table, under "Applying view changes".

Cons:

  • Data Platform SREs own their dedicated wikireplica host (an-redacteddb*) but have little context about public-facing wikireplica hosts (clouddb*) and the users and tools relying on them.

Option 4

We add an option to the cookbooks to specify which hosts should be targeted, so that each team (cloud-services-team and Data-Platform-SRE) can run the cookbooks when it's most convenient, and target only the hosts they manage.

Pros:

  • More isolation between the teams: we don't have to worry about impacting another team.

Cons:

  • Potential lack of alignment between the views in clouddb* hosts and the views in an-redacteddb* hosts.