MariaDB/Zarcillo
Zarcillo
https://zarcillo.wikimedia.org/ is a tool that displays information about Wikimedia MariaDB shards and servers. Unlike Orchestrator, it is publicly accessible albeit many functions are only accessible to SREs.
Zarcillo is being developed as part of T384810 and its related tasks.
The service provides a web UI at https://zarcillo.wikimedia.org/ .
It runs on Kubernetes and provides:
- A web UI and API to display database status
- Shows depooled hosts (T384212)
- Depooling/repooling MariaDB replicas based on health and role transitions
- Helps switch between
master,candidate-master, andreplicaroles - Lock coordination to avoid conflicting maintenance/failover activities
- Tracks the source and validity of instance status “ground truth”
Source code: Zarcillo Main tracking ticket: T384810
See Architecture documentation for more details.
Grafana dashboards:
- https://grafana.wikimedia.org/d/8USki6XVk/zarcillo
- https://grafana-rw.wikimedia.org/d/celrzpf6av8qob/zarcillo
Web UI
The web UI is published at zarcillo.wikimedia.org.
After logging in, it provides the pages listed below.
Most tables are sortable. Some headings can be hotlinked (e.g. https://zarcillo.wikimedia.org/ui/weights#es2).
Instances dashboard
See https://zarcillo.wikimedia.org/ui/instances
Displays "raw" contents from:
Locks
See https://zarcillo.wikimedia.org/ui/locks
Lists locks by instance and user.
Also provides:
- Form to acquire new locks
- Button to release locks
Hosts dashboard
See https://zarcillo.wikimedia.org/ui/hosts
Shows database hosts including:
- DC
- Location (datacenter)
- Sections served (with a sleepy icon 😴 when not pooled)
- Kernel version
- MariaDB version
- Active locks
- Alarms (Icinga / Alertmanager)
It allows highlighting old kernel or MariaDB versions using the Filter button.
It also allows adding hosts from the UI.
Schema change summary
See https://zarcillo.wikimedia.org/ui/schema_change
Shows current and past schema changes tracked by schema_change_helper.py (see PR 42).
Hovering on checkboxes shows who ran the helper and when.
The icons represent:
- No icon: schema change never started.
- Hourglass (⏳): schema change pending during the current auto_schema run. The host has not been depooled yet.
- Spinner: depooling, schema change or pooling are ongoing on the instance.
- Checkmark (✅): schema change completed.
Sections dashboard
https://zarcillo.wikimedia.org/ui/sections
Primary dashboard showing all sections and their hosts.
For each host it shows:
- Hostname, instance port, role
- Replication lag (from
heartbeattable via Prometheus metricsmysql_heartbeat_now_timestamp_secondsandmysql_heartbeat_stored_timestamp_seconds) - Host uptime (
node_boot_time_seconds) - Tags (pooled, preferred candidate, alarms, etc.)
- Candidate score (CS)
Zarcillo computes a candidate score (CS) for replica hosts used in master switchover decisions.
Score is based on:
- Existing alarms (none is better)
- Replication lag (lower is better)
- Uptime (higher is better)
- Kernel version (newer is better)
- MariaDB version (newer is better)
This supports safer switchover operations and rolling upgrades.
Weights dashboard
See https://zarcillo.wikimedia.org/ui/weights
Shows instance weights grouped by section, hostname, and groups.
Highlights standardized weights in 2025 in the “std” column.
Flags differences between eqiad and codfw in the “diff” column.
Candidate planner
See https://zarcillo.wikimedia.org/ui/planner
Uses z3 to compute candidate locations and database movements to minimize the risk of rack failures impacting multiple masters or candidates.
Added in https://gitlab.wikimedia.org/repos/sre/wmfmariadbpy/-/merge_requests/20 per T371362 - currently at prototype stage.
Clone dashboard
See https://zarcillo.wikimedia.org/ui/host_clone_events
Shows the status of the MariaDB clone cookbook runs as per T417608
API and documentation
See https://zarcillo.wikimedia.org/apidocs
OpenAPI/Swagger documentation for all API and UI endpoints.
Notes:
/apireturns JSON/contentis for HTMX building blocks/healthzis for Kubernetes health checks/metricsexposes Prometheus metrics
Development
You can use Just and the related justfile.
List available commands with:
$ just -l
Available recipes:
copy_prod_tables_from_db1215_to_preprod
deploy_prod_once # Deploy current container
deploy_prod_polling # Poll/deploy on changes
fetch_logs # Fetch production logs
fetch_logs_and_follow # Fetch raw production logs
generate_html_docs # Requires asciidoctor
generate_run_local_container
import_prod_tables_from_db1215_to_localdev
ingest_puppet_hiera_data # Import Hiera data
ingest_puppet_role_data # Import Role data
kube_get_pods # List K8s pods
local_test
local_test_automation
log_on_local_mariadb
log_on_preprod_mariadb
log_on_prod_mariadb
run_ci_podman_devel
run_ci_podman_prod # Run CI container locally
setup_local_dev_pod
setup_local_mariadb
setup_local_testbed
zap_local_dev_pod
setup_local_testbed sets up dedicated containers and populate MariaDB locally. It requires a local copy/symlink of service-template/generate_local_podman_container.py
For deploying see deploy_prod*. It needs deploy.json deployer.py setup_service.py in the ./local dir.