Puppet/Pontoon/Services

From Wikitech

Pontoon is designed to be a realistic testbed for production, but smaller. Trade offs change too: we can exchange some reliability and availability for a simpler system that’s easy to understand and operate, with all the relevant functionality intact.

What's a service?

One of Pontoon’s requirements is that prototyping must be easy, therefore setting up a new service should be a few lines of configuration. Client and server configurations for the service must work the same as production, save for a few exceptions like resource limits.

A service in Pontoon is defined in service::catalog and has the following characteristics:

  • The service’s name is its catalog key
  • Deployed to the hosts running its role catalog key, listening on port
  • Known on the network as <service name>.discovery.wmnet
  • HTTP(S) only. TLS is recommended but optional

These criteria are meant to include most if not all production services used for user requests.

Load Balancing

The default LB implementation performs layer four load balancing and proxying of client connections with the hosts assigned to the service’s role. Backend pool selection happens at the HTTP and TLS layers: Host header based for the former and SNI based for the latter. The implementation consists of haproxy on a dedicated host running the pontoon::lb role, see also the initial gerrit review.

Service Discovery

Discovery is DNS based and happens locally on all hosts, by default all services with a role key are pointed to the stack’s load balancer.  Services will resolve on all main production domains (discovery.wmnet, svc.eqiad.wmnet, svc.codfw.wmnet).

Implementation wise, systems are configured to use 127.0.0.53 as their resolver. Dnsmasq listens on that address and is configured statically with “hosts file style” entries for services. Queries for all other names are sent to DNS resolvers upstream, see also the initial gerrit review for sd.

Public services

In this context, public means the service available at the edge of the network. Note that not all services are public, although they can be. To support public services the following concepts are introduced, some of which are useful in production too:

$public_domain variable
Indicates the "external" or "public" domain under which services are expected to run. Usually in the form of <service>.<public_domain>. For production the public_domain is obviously wikimedia.org, and for Pontoon stacks it is some form of third-level domain, e.g. monitoring.wmflabs.org in observability's case. The idea and intended usage for this variable is to be able to stop hardcoding 'wikimedia.org' in public services configurations and make realistic testing/prototyping easier to achieve. $public_domain is related to, but different from, $domain: the latter refers to the network domain and can vary between internal and external, whereas the former is a logical/administrative domain used to serve public services. The variable can and should be used in production as well as Pontoon. See also gerrit review.
public_endpoint in service::catalog

This option is what makes a service public: the service will be available at https://$public_endpoint.$public_domain. Most typically this value is the same as the service's name (e.g. thanos). The value can be useful in production as well, see also gerrit review.

Frontend

Pontoon ships with a layer seven frontend to proxy public services. The concepts introduced above are used to program the frontend according to the service’s configuration. The frontend is not coupled with service discovery or the internal load balancer and thus can be deployed standalone.

The following requirements must be met:

  • A public (floating) IP associated with the VM running pontoon::frontend role.
  • A DNS wildcard A record for *.$public_domain resolving to the public IP.

For each public service the frontend will:

  1. Acquire a letsencrypt certificate for $public_endpoint.$public_domain
  2. Redirect http://$public_endpoint.$public_domain to https://$public_endpoint.$public_domain
  3. Reverse-proxy requests to all hosts running assigned to the service’s role

An example change to add all observability services to service::catalog with their respective public_endpoints can be found at this review. The implementation Pontoon’s frontend is part of the pontoon-lb Gerrit topic.

Enable SD and LB in your stack

You can activate Pontoon' SD and LB mechanisms in your stack with the following steps:

  1. Provision Pontoon Load Balancer: enroll a Bullseye host with role pontoon::lb to your stack
  2. Symlink the sd_cloudvps.yaml settings (modules/pontoon/files/settings/sd_cloudvps.yaml) in your stack's hiera. Push the changes to your Pontoon server.
  3. At the next puppet run the Pontoon Service Discovery will be enabled.
  4. Make sure your service is in service::catalog and has its role filled in. Once that's done you'll be able to talk to e.g. <your service>.discovery.wmnet in your Pontoon stack