SRE/Infrastructure Foundations/Ownership
Service | Category | Description | Phabricator tag | Notes |
---|---|---|---|---|
Install server | Bare metal Infrastructure | An install server consists of DHCP, TFTP, webproxy (Squid) and apt.wikimedia.org (reprepro) servers. | This information is outdated.
| |
Ganeti | Bare metal Infrastructure | Clustered virtual machine management software tool built on top of existing virtualization technologies such as Xen or KVM and other open source software. It supports both KVM and Xen. At WMF we only have KVM as an enabled hypervisor. | ||
Puppet | Configuration Management Systems | Puppet is the main configuration management tool to be used on the Wikimedia clusters.
|
https://phabricator.wikimedia.org/tag/puppet/ | Infrastructure part |
PCC | Configuration Management Systems | PCC - Puppet compiler. Compiler run Puppet Server and PuppetDB services, as well as a file sync client. When triggered by a web endpoint, file sync takes changes from the working directory on the primary server and deploys the code to a live code directory. File sync then deploys that code to all compilers. | ||
Puppetboard | Configuration Management Systems | Puppetboard is a web interface to PuppetDB aiming to replace the reporting functionality of Puppet Enterprise console. | https://puppetboard.wikimedia.org/ | |
DebMonitor | Configuration Management Systems | DebMonitor is a Debian package tracker website and tool developed at the Wikimedia Foundation and used to track installed and upgradable packages across the fleet. It consists of DebMonitor website and DebMonitor client. | https://debmonitor.wikimedia.org/ | |
Homer | Configuration Management Systems | Homer is our homemade network configuration manager. It takes variables from Netbox and yaml files, run them through jinja templates to generate Juniper compatible configuration. Homer can then send those configurations to selected network devices, for a diff or a safe commit. | https://phabricator.wikimedia.org/tag/homer/ | |
Cookbooks | Orchestration Tooling | |||
Spicerack | Orchestration Tooling | Spicerack is a Python library to orchestrate tasks in the Wikimedia Foundation production environment. It comes with an easy API and a cookbook entry point script that allows to write simple Cookbooks to automate and orchestrate tasks. | ||
Server Lifecycle/Reimage | Orchestration Tooling | Fully automated OS (re)installation for physical hosts. | ||
Debdeploy | Orchestration Tooling | Debdeploy allows the deployment of software updates in Debian (or Debian-based) environments on a large scale. It is based on Cumin; updates are initiated via the debdeploy tool running on the Cumin master. Servers can be grouped into arbitrary sets of servers/services based on the Cumin syntax. | ||
Conftool | Orchestration Tooling | Conftool is a set of tools we use to sync and manage the dynamic state configuration for services (varnish backend, the pybal pools, the DNS discovery entries, and some variables in Mediawiki configuration). This configuration is stored in the distributed key/value store: Etcd. | ||
Dbctl | Orchestration Tooling | Dbctl is a tool based on conftool to store Mediawiki's database loadbalancer configuration in etcd.
In production, the only hosts with dbctl installed are the cumin cluster management hosts (e.g. cumin1001). |
||
Cumin | Orchestration Tooling | Cumin is an automation and orchestration framework that provides a flexible and scalable automation framework to execute multiple commands on multiple hosts in parallel.
It allows to easily perform complex selections of hosts through a user-friendly query language which can interface with different backend modules and combine their results for a fine grained selection. The transport layer can also be selected, and can provide multiple execution strategies. The executed commands outputs are automatically grouped for an easy-to-read result. |
||
Wmflib | Orchestration Tooling | A Python package that contains custom modules to interact with the WMF production infrastructure.
It can be used in any script throughout the fleet as it doesn't require any special privilege to be run, as opposed to Spicerack and its Cookbooks and removes the need to re-implement each time the same functionalities over and over again. |
||
PKI | Infrastructure security and packaging | A public key infrastructure is a set of roles, policies, hardware, software and procedures needed to create, manage, distribute, use, store and revoke digital certificates and manage public-key encryption. We currently use CFSSL to provide and manage PKI solutions. Clients are able to make use of the CFSSL API end point (it requires using the puppet agent certificate). Further to the client auth requirement API request also need to be signed with a hmac using a secret key (available in the puppet private repo) | ||
CAS-SSO | Infrastructure security and packaging | The Wikimedia Developer SSO Portal at idp.wikimedia.org is a single sign-on (SSO) infrastructure built on Apereo CAS. When logging into a CAS-enabled website without an active SSO session you'll be redirected to the CAS login page. The CAS service collects LDAP group memberships and makes them available to services for making authorisation choices. After authentication the users get redirected to the initiating service. | https://phabricator.wikimedia.org/tag/cas-sso/ | |
Reprepro | Infrastructure security and packaging | Reprepro is able to manage multiple repositories for multiple distribution versions in one package pool. It can process updates from an incoming directory, copy package (references) between distribution versions, list all packages and/or package versions available in the repository, etc. Reprepro maintains an internal database (a .DBM file) of the contents of the repository, which makes it quite fast and efficient.
|
||
Cowbuilder | Infrastructure security and packaging | A module used to populate a Debian/Ubuntu package building environment. Meant to be used in the Wikimedia environment but could be adapted for other environments as well. | ||
Netbox | Infrastructure security | Netbox is a "IP address management (IPAM) and data center infrastructure management (DCIM) tool". | https://phabricator.wikimedia.org/tag/netbox/ | https://netbox.wikimedia.org/ |
Netmon | Infrastructure security | Netmon is a network monitoring system with high-performance traffic sniffing technology. | ||
RPKI | Infrastructure security | Resource Public Key Infrastructure is a public key infrastructure framework to support improved security for the Internet's BGP routing infrastructure. RPKI provides a way to connect Internet number resource information to a trust anchor. | ||
Cloudflare Magic Transit | Infrastructure security | Cloudflare's Magic Transit protects IP subnets from DDoS attacks. It uses Cloudflare's global network to mitigate attacks, employing two networking protocols: BGP and GRE, for routing and encapsulation. Cloudflare wrote a case study about our use of Magic Transit. | ||
NEL | Infrastructure security | Network Error Logging is a mechanism that can be configured via the NEL HTTP response header. This header allows web sites and applications to opt-in to receive reports about failed (and, if desired, successful) network fetches from supporting browsers. | ||
keyholder | Infrastructure security | a set of scripts that allow a group of users to use an SSH key without sharing the private key with the members of the group. | https://phabricator.wikimedia.org/tag/keyholder/ | |
Failoid | Miscellanea | Fallback backend that immediately close the connection used in the DNS/Discovery setup. |