SRE/Service Operations/Ownership

From Wikitech
Service/Procedure Description Phabricator tag Notes
WikiKube Kubernetes Cluster Kubernetes (often abbreviated k8s) is an open-source system for automating deployment, and management of applications running in containers.
Mediawiki servers The Application servers (or app servers) are the several hundred Apache servers that run the MediaWiki backend software (written in PHP).
Memcached for MediaWiki There are two logical pools of memcached servers for MediaWiki. There are critical for performance of all sites and used extensively
Redis Misc Redis is used in Wikimedia production for:

changeprop (role::redis::misc) As a cache and queue backend in ORES Receiver of sampled profile data from PHP, as part as the sampling/profiling pipeline (Arc Lamp).

Shellbox Shellbox is a library for remote command execution, and a server for secure command execution. It was primarily implemented to sandbox lilypond (used by the Score extension) and provide a way for MediaWiki to utilize external binaries without needing them to be in the same container. Shellbox relies on Kubernetes (and Linux containers/namespaces) to provide isolation and resource limits for external commands.
Datacenter Switchover A datacenter switchover (from eqiad to codfw, or vice-versa) comprises switching over multiple different components, some of which can happen independently and many of which need to happen in lockstep. This page documents all the steps needed to switch over from a master datacenter to another one, broken up by component. SRE Service Operations maintains the process and software necessary to run the switchover.
Service Level Objectives Service Level Objective (SLO) and Service Level Indicators (SLI)
Kafka-main kafka-main is the low-volume, critical production services cluster. Talk to us before starting to send events there. kafka-main is currently used directly by Event_Platform/EventGate and change-propagation.
GitLab GitLab is reachable at https://gitlab.wikimedia.org/. We run multiple instances of GitLab:

gitlab1001 runs production GitLab serving https://gitlab.wikimedia.org/ gitlab2001 runs a passive GitLab replica serving https://gitlab-replica.wikimedia.org/ (WIP) gitlab-ansible-test in WMCS gitlab-test project gitlab in WMCS gitlab-test project

VRT System Volunteer Response Team System is a ticket and process management system. The software used is called Znuny, a fork of OTRS
Etherpad Etherpad is an open-source, web-based real-time editor, allowing authors to simultaneously edit a text document, and see all of the participants' edits in real-time, with the ability to display each author's text in their own color. There is also a chat box in the sidebar to allow meta communication.