SRE/Infrastructure Foundations/Ownership

From Wikitech
Jump to navigation Jump to search
Service Category Description Phabricator tag Notes
Install server Bare metal Infrastructure An install server consists of DHCP, TFTP, webproxy (Squid) and (reprepro) servers.
Ganeti Bare metal Infrastructure Clustered virtual machine management software tool built on top of existing virtualization technologies such as Xen or KVM and other open source software. It supports both KVM and Xen. At WMF we only have KVM as an enabled hypervisor.
Puppet Configuration Management Systems Puppet is the main configuration management tool to be used on the Wikimedia clusters.

puppet agent is the client daemon that runs on all servers, and manages machines with configuration information gathered from puppetmasterd. Infrastructure part
PCC Configuration Management Systems PCC - Puppet compiler. Compiler run Puppet Server and PuppetDB services, as well as a file sync client. When triggered by a web endpoint, file sync takes changes from the working directory on the primary server and deploys the code to a live code directory. File sync then deploys that code to all compilers.
Puppetboard Configuration Management Systems Puppetboard is a web interface to PuppetDB aiming to replace the reporting functionality of Puppet Enterprise console.
DebMonitor Configuration Management Systems DebMonitor is a Debian package tracker website and tool developed at the Wikimedia Foundation and used to track installed and upgradable packages across the fleet. It consists of DebMonitor website and DebMonitor client.
Homer Configuration Management Systems Homer is our homemade network configuration manager. It takes variables from Netbox and yaml files, run them through jinja templates to generate Juniper compatible configuration. Homer can then send those configurations to selected network devices, for a diff or a safe commit.
Cookbooks Orchestration Tooling
Spicerack Orchestration Tooling Spicerack is a Python library to orchestrate tasks in the Wikimedia Foundation production environment. It comes with an easy API and a cookbook entry point script that allows to write simple Cookbooks to automate and orchestrate tasks.
Server Lifecycle/Reimage Orchestration Tooling Fully automated OS (re)installation for physical hosts.
Debdeploy Orchestration Tooling Debdeploy allows the deployment of software updates in Debian (or Debian-based) environments on a large scale. It is based on Cumin; updates are initiated via the debdeploy tool running on the Cumin master. Servers can be grouped into arbitrary sets of servers/services based on the Cumin syntax.
Conftool Orchestration Tooling Conftool is a set of tools we use to sync and manage the dynamic state configuration for services (varnish backend, the pybal pools, the DNS discovery entries, and some variables in Mediawiki configuration). This configuration is stored in the distributed key/value store: Etcd.
Dbctl Orchestration Tooling Dbctl is a tool based on conftool to store Mediawiki's database loadbalancer configuration in etcd.

In production, the only hosts with dbctl installed are the cumin cluster management hosts (e.g. cumin1001).

Cumin Orchestration Tooling Cumin is an automation and orchestration framework that provides a flexible and scalable automation framework to execute multiple commands on multiple hosts in parallel.

It allows to easily perform complex selections of hosts through a user-friendly query language which can interface with different backend modules and combine their results for a fine grained selection. The transport layer can also be selected, and can provide multiple execution strategies. The executed commands outputs are automatically grouped for an easy-to-read result.

Wmflib Orchestration Tooling A Python package that contains custom modules to interact with the WMF production infrastructure.

It can be used in any script throughout the fleet as it doesn't require any special privilege to be run, as opposed to Spicerack and its Cookbooks and removes the need to re-implement each time the same functionalities over and over again.

PKI Infrastructure security and packaging A public key infrastructure is a set of roles, policies, hardware, software and procedures needed to create, manage, distribute, use, store and revoke digital certificates and manage public-key encryption. We currently use CFSSL to provide and manage PKI solutions. Clients are able to make use of the CFSSL API end point (it requires using the puppet agent certificate). Further to the client auth requirement API request also need to be signed with a hmac using a secret key (available in the puppet private repo)
CAS-SSO Infrastructure security and packaging The Wikimedia Developer SSO Portal at is a single sign-on (SSO) infrastructure built on Apereo CAS. When logging into a CAS-enabled website without an active SSO session you'll be redirected to the CAS login page. The CAS service collects LDAP group memberships and makes them available to services for making authorisation choices. After authentication the users get redirected to the initiating service.
Reprepro Infrastructure security and packaging Reprepro is able to manage multiple repositories for multiple distribution versions in one package pool. It can process updates from an incoming directory, copy package (references) between distribution versions, list all packages and/or package versions available in the repository, etc. Reprepro maintains an internal database (a .DBM file) of the contents of the repository, which makes it quite fast and efficient.
Cowbuilder Infrastructure security and packaging A module used to populate a Debian/Ubuntu package building environment. Meant to be used in the Wikimedia environment but could be adapted for other environments as well.
Netbox Infrastructure security Netbox is a "IP address management (IPAM) and data center infrastructure management (DCIM) tool".
Netmon Infrastructure security Netmon is a network monitoring system with high-performance traffic sniffing technology.
RPKI Infrastructure security Resource Public Key Infrastructure is a public key infrastructure framework to support improved security for the Internet's BGP routing infrastructure. RPKI provides a way to connect Internet number resource information to a trust anchor.
Cloudflare Magic Transit Infrastructure security Cloudflare's Magic Transit protects IP subnets from DDoS attacks. It uses Cloudflare's global network to mitigate attacks, employing two networking protocols: BGP and GRE, for routing and encapsulation. Cloudflare wrote a case study about our use of Magic Transit.
NEL Infrastructure security Network Error Logging is a mechanism that can be configured via the NEL HTTP response header. This header allows web sites and applications to opt-in to receive reports about failed (and, if desired, successful) network fetches from supporting browsers.
keyholder Infrastructure security a set of scripts that allow a group of users to use an SSH key without sharing the private key with the members of the group.
Failoid Miscellanea Fallback backend that immediately close the connection used in the DNS/Discovery setup.