Jump to content

Puppet/Pontoon

From Wikitech

Reach out to godog / Filippo Giunchedi on #wikimedia-sre connect for more information and assistance

Intro

Pontoon enables you to (re)create isolated and disposable copies of production in Cloud VPS. The intended audience are SRE folks, although anyone with Cloud VPS access can follow the instructions -- no special access is required. The terminology "copies of production" in this context means your changes to puppet.git will reflect as close as possible what is going to happen in production (for example, role variables in hieradata/role/ work as expected, unlike plain Cloud VPS). For more details and information see the rationale page.

Installation and quickstart

The prerequisites to run Pontoon are the following:

You can find the installation instructions in Pontoon's README.md under modules/pontoon. The documentation will guide you through setting up pontoonctl (the CLI interface), create your first Pontoon stack or join an existing stack.

Howto

This section contains miscellaneous instructions on common Pontoon operations.

Make roles work in Pontoon

Read this section if you have added a new role to your stack and you are sad (e.g. Puppet fails).

There are a few failure classes to think about:

  1. Undefined hiera variables. Check the common hiera settings file in modules/pontoon/files/settings for the missing values. If the values are not set already you’ll need to add them; see hiera section on how to do that.
  2. Services on the host are unhealthy. The service’s dependencies haven’t been bootstrapped yet (e.g. databases, users missing, etc), the service can’t reach its dependencies (see also Services page for details on services in Pontoon), private material is missing (TODO section on private).

Debugging and fixing these issues will also help find production bugs (e.g. a reimaged host will likely yield the same error, porting a role to a new Debian distribution, etc). Typical bootstrap problems are directory initialization (puppetdb, trafficserver) or service dependencies (trafficserver). Keep in mind that fixing some of these issues might require hacks, and that's okay given that Pontoon is not production and an hack enabling automation is more desirable than manually bootstrapping and fixing services.

Load balancing and service discovery

Pontoon provides a production-compatible load balancing and service discovery layer. When enabled you'll be able to reach svc.<site>.wmnet and discovery.wmnet DNS names backed by the services in your stack. See also the Services page for more information on this topic and how to enable LB/SD.

"Reimage" an host

To reprovision/reimage an host in your stack use the following steps:

  1. Provision a new VM with the same specs and increment the trailing index e.g. o11y-puppet-01 becomes o11y-puppet-02
  2. Swap the new hostname in rolemap.yaml and push the change to your Pontoon stack
  3. Enroll the new host in the stack
  4. Delete the old host

The above is necessary because the "rebuild" horizon feature doesn't seem to work.

Get patches ready for review

Before sending a patch for review you will iterate locally by pushing to your Pontoon stack. Ideally the stack can run off production branch with no further patches, however in many cases you will have multiple commits on top of production to get your stack to work (normally published in the sandbox/ namespace branches). Therefore once you have finished iterating on a patch, and to avoid sending your patch plus all the stack-specific commits you'll need to toggle your patch's "base", from your stack's branch to production. The commands to achieve that are the following, assuming your stack's branch is pontoon-STACK:

 # To activate pontoon-STACK patches "below" the current branch instead of production:
 git rebase -i --onto pontoon-STACK
 # To restore the original "base" of production
 git rebase -i --onto production pontoon-STACK

The pontoon-STACK branch might need to be itself rebased on top of production first (and pushed to the sandbox/ namespace if applicable), to make sure the branch is updated.

Finally, the stack-specific changes (i.e. in modules/pontoon/files/STACK NAME/) are expected to be pushed to Gerrit and merged in the production branch to allow for multi-team collaboration.

Hiera

One of the key goals of Pontoon's "look and feel" is to be as close as possible to production. To this end, there are two guidelines to keep in mind when writing your stack’s hiera:

  • Minimal: only variables differing from production should be in your stack’s hiera (e.g. resource limits). If you are setting a variable with the same value as production, include it in production only and not in your stack.
  • Generic: group your hiera settings files by the functionality they enable. Shared settings files are also available to be included in your stack for common functionality (e.g. puppetdb.yaml, prometheus.yaml, etc)

Caveats and limitations

Writing a stack’s hiera can be as straightforward as setting a few variables, however there are some caveats to keep in mind:

  • Replace lists of hostnames with their role when possible. To do so, use “%{alias('__hosts_for_role_ROLE')}” as your variable’s value. The result will be expanded at lookup time with a list of hosts running the role in rolemap.yaml. The full list of available variables can be found in /etc/puppet/hieradata/auto.yaml on the Pontoon server (the file is updated by a puppet run on any host). Not having hardcoded hostnames truly makes hiera settings generic with respect to a particular stack and thus shareable with other stacks. There's also a crude "master election" available: “%{alias('__master_for_role_ROLE')}” will expand to a string with the first host running ROLE in rolemap.yaml.
  • Double colons in role names need to be replaced with double underscore
  • Only one role at a time can be expanded and used as a value: the alias function call must be the only value. No concatenation of role hostlist variables is possible from within hiera.
  • Sometimes you’ll have to hardcode hostnames, for example nested data structures with each host in a role being the hash’s key.
  • No interpolation of host lists via alias(), for example variables requiring a list of host:port will require hardcoded hostnames, or split ‘port’ into its own variable.
  • Per-host hiera overrides are available, however generic settings are preferred.
  • You will have to make compromises on production features to enable. This problem usually manifests when first porting your role(s) to Pontoon. Ideally your stack enables all production (sub)systems that are relevant to you. Sometimes though having all subsystems available is not possible or practical. In these cases consider disabling the system/feature via your stack’s hiera. TODO include examples

Lookup order

Your stack’s hiera sits above production and thus takes precedence over it. All other production functionality (e.g. role lookups) will be performed as usual. The relevant files and paths (in the order they are looked up, first match wins) are the following:

modules/pontoon/files/STACK/hiera/hosts/
This path allows for host-specific hiera settings if desired. Similarly to production, HOSTNAME.yaml will be searched for hiera settings.
modules/pontoon/files/STACK/hiera/
This is the main path for hiera overrides for your STACK. This path takes precedence over production hiera. All *.yaml files in this directory will be searched for variables, irrespective of their name. Typically files are named after the general area/service that they affect, and/or which feature they enable. In some cases the files are generic and shared among stacks with symlinks; for example puppetdb.yaml contains the minimal settings for a functional puppetdb in Pontoon, and the file links to the shared puppetdb.yaml
hieradata/pontoon.yaml
Common to all Pontoon stacks, changes to this file are not needed in most circumstances.

Team collaboration and git branches

A Pontoon stack is likely to be shared among multiple people, often in the same team. Ideally we are able to run an unmodified production branch on the Pontoon server, however there are a exceptions that warrant having a stack-specific branches. As of March 2021 the workflow for such branches is the following:

  1. The branch is pushed under the sandbox/ namespace, to allow for force-push. For example sandbox/filippo/pontoon-o11y is the branch for the observability stack. Note that you'll also allow access to ldap/ops at https://gerrit.wikimedia.org/r/admin/repos/operations/puppet,access
  2. Such branches should be periodically rebased on top of production and force-pushed. Note that the Pontoon server will also rebase its local production to keep up with updates. As with any self-hosted Puppet server the rebasing can fail, thus it is important to keep the sandbox branches rebased.
  3. The stack branch is force-pushed as production to the Pontoon server, as explained in the Howto section.

Keep in mind that a Pontoon master has the "auto rebase" feature enabled: the git repository will periodically try to rebase itself on production (like a standalone puppet master). Therefore it is important to keep stack-specific (sandbox) branches rebased periodically too or the auto rebase process will fail if there are conflicts.