Wikimedia Cloud Services team/EnhancementProposals/Decision record T346153 Toolforge (re)architecture

From Wikitech

Origin task: phab:T346153

Date of the decision: 2023-10-12

No decision meeting, a decision was made in the task.


Decision taken

Option 1 - microservices on the backend + api gateway + a single CLI

Rationale

The ability to manage the different backend services in different ways was considered worth the potential extra complexity associated with a microservices architecture. On the other hand, consolidating the current CLI clients opens up the possibility to move toward a single binary, that users could one day install locally. The api gateway provides a way to decouple the CLI (and other future clients such as a UI) from the backing microservices.

Problem

Over the last year, the Toolforge ecosystem has evolved beyond 'jobs' and 'webservice' to include the new build system, featuring a system for environment variables and secrets, a potential 'deploy' subcommand, and more. In the current architecture, each subsystem has its own API and CLI. As Toolforge continues to evolve, this decision request is to decide whether to continue this pattern of development, or change to a different architecture.

Constraints and risks

With keeping the current architecture

  • Increased Complexity: The architecture could become increasingly complex, making it harder to manage, maintain, and onboard new contributors.
  • Resource Inefficiency: The current architecture might require more resources for maintenance than a new, more efficient architecture, leading to wasteful allocation of engineering time
  • Community Disengagement: The existing complexity may deter new contributions.

With moving to a different architecture

  • Implementation Challenges: Transitioning to a new architecture could be resource-intensive (engineering-time wise) and meet resistance.
  • Backward Compatibility: Changes must consider the impact on existing services, posing a risk of breaking functionalities or affecting the user experience
  • Operational Overhead: Unexpected complexities in deployment and monitoring may arise, requiring more operational effort.

Options considered

Option 1

Backend(API gateway + several per-service APIs) + Client(Single codebase/monolith)

Pros:

  • Decoupling between frontend and backend through API gateway.
  • Simpler to improve user client experience with a single package
  • Client easier to do big contributions (all code together, shipped as one)
  • Backend easier to do small contributions (easier to understand/test/depoly just one small part)
  • Increased flexibility and scalability
  • Backend easy to move to a monolithic system
  • Backend easier for others to reuse outside toolforge

Cons:

  • Client hard to move to a decoupled system
  • Client harder to do small contributions (must test all flows for any change, must understand the whole system)
  • Backend harder to do big contributions (split repos, split deployments)
  • Potential for backend operational complexity due to multiple service APIs.
  • API gateway introduces an additional system to maintain
  • Backend split repos could end up in a high degree of code repetition and boilerplate.
  • Client harder for others to reuse outside toolforge

Option 2

Backend(Single API service with all services in it) + Client(Single codebase/monolith)


Pros:

  • Highly integrated and simplified operation/deployment
  • Simpler to improve user client experience with a single package
  • All easier to do big contributions (all code together, shipped as one)
  • Remove API gateway system (functionality must be re-written in the API though)

Cons:

  • All hard to move to a decoupled system
  • All harder to do small contributions (must test all flows for any change, must understand the whole system)
  • Reduced flexibility and scalability
  • Tight coupling could make future changes more challenging
  • All harder for others to reuse outside toolforge

Option 3

Backend(API gateway + per-service APIs) + Client(Per-service codebases) - Status Quo

Pros:

  • Decoupling between frontend and backend through API gateway.
  • Existing familiarity and no immediate changes required
  • High degree of decoupling allows for services to be managed independently from development to deployment
  • All easier to do small contributions (easier to understand/test/deploy just one small service)
  • All easy to move to a monolithic system
  • All easier for others to reuse outside toolforge

Cons:

  • All harder to do big contributions (split repos, split deployments)
  • More complex to improve client experience with a single package
  • Potential for backend operational complexity due to multiple service APIs.
  • Harder to reason about the system as a whole
  • All split repos could end up in a high degree of code repetition and boilerplate.