Jump to content

Gerrit/Architecture

From Wikitech

Gerrit Architecture

Overview

This document provides a comprehensive overview of the current architecture of the Gerrit instance used at the Wikimedia Foundation. It includes details on why Gerrit is not behind a CDN, and the alerting mechanisms in place.

It operates on dedicated servers rather than behind a Content Delivery Network (CDN). This choice was influenced by historical decisions, operational requirements, and technical limitations discussed below.

Current Alerting Mechanisms

  • Icinga Monitoring: Configured to check Gerrit's HTTPS availability. It sends IRC alerts to the the admin contact group.
  • Prometheus Blackbox Monitoring: Configured to monitor HTTP(S) endpoints by checking for the HTTP status, following redirects, and verifying the presence of specific content (e.g., "Gerrit Code Review"). It sends alerts to IRC, creates log entries, and creates Phabricator tasks on service availability.

Decision to Not Use CDN

Historical Context

Gerrit is not positioned behind a CDN due to a decision made by the Wikimedia SRE team. The decision, which hasn't been documented previously, was made ~2013 and was at various times discussed between SREs, e.g. during onboarding. As the team grew, this practice no longer sufficed.

The decision to not place Gerrit behind a CDN was made to avoid additional points of failure and ensure the availability of critical tools independently of the CDN and it was based on the following considerations:

  1. Decoupling from the CDN: Ensuring Gerrit remains available without relying on the CDN, which could introduce additional points of failure.
  2. Technical Complexity: Gerrit requires the SSH port (29148) to be accessible on the same IP, which complicates using a CDN due to the need for TCP proxy functionality

Re-evaluation of CDN Usage

Despite the historical stance, in 2024 a debate started about whether the advantages of placing Gerrit behind a CDN, such as improved abuse mitigation, would outweigh the risks. With advancements in technology and anti-abuse measures within modern CDN setups, this decision may be revisited in light of new technologies and organizational needs. The phabricator task where the discussion is taking place is at T365259

A key consideration for this discussion is to explore potential solutions to address the technical challenges related to SSH and TCP proxy if the decision to use a CDN is revisited.

Emergency Procedures

In the event of Gerrit's downtime, emergency procedures are in place to ensure continuity of operations:

  • DNS Updates: If Gerrit is unavailable, DNS changes can still be performed as outlined in the DNS Emergency Procedures.
  • Direct Deployments: Operations requiring immediate changes can be handled directly on deployment servers, bypassing Gerrit until it is back online.