Jump to content

Help:Puppet-compiler

shortcut: PCC
From Wikitech

Overview

You can run puppet-compiler by hand to get the results of a given puppet configuration without having to deploy it to servers.

This page provides instructions for this process.

Catalog compiler in integration Jenkins

There is a Jenkins job that takes a Gerrit change and runs the compiler.

Steps:

  1. Push your change to gerrit using git-review
  2. Go to https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/
  3. Go to "Build with parameters" https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/build
  4. In the form, fill change number (from Gerrit) and list of nodes
  5. Hit the Build button
  6. Wait for the Jenkins job to end
  7. You can check for results in the Jenkins Console output
  8. You can see the compiled catalogs in a web frontend. The URL structure is https://puppet-compiler.wmflabs.org/compiler_host/build_id, where:
    • compiler_host is the hostname (without domain name) of the compiler node that Jenkins dispatched the build to. A current list of possible compiler nodes is available at https://integration.wikimedia.org/ci/label/puppet-compiler-node/
    • build_id is the unique id of the Jenkins build (changes with every run)
    • This link is automatically constructed and can be found at the bottom of the Jenkins console output after each build.

Host variable override

The list_of_nodes supports selecting hosts using the following methods

  • an empty list. In this case pcc attempt to pick one host for each definition in site.pp
  • comma separate list of hosts e.g. example1001.eqiad.wmnet,example2001.wikimedia.org,example3001.esams.wmnet
  • Regular expression - you can use re: followed by a regular expression to select hosts e.g. to select all puppetmasters use re:puppetmaster.*wmnet
  • Simplified Cumin syntax. you can use the P:, C:, O: and R: cumin prefixes to select hosts based on their profile, class, role or resource e.g.
    • select all hosts with the envoy class: C:envoy
    • select all hosts with the tlsproxy::envoy profile: P:tlsproxy::envoy
    • select all hosts with the mediawiki::appserver role: O:mediawiki::appserver
    • select all hosts using the cfssl::certificate resource: R:cfssl::certificate
  • cumin puppetdb backend expresions - you can use cumin: followed by a cumin query using the puppetdb grammar. however please keep in mind that the puppetdb on the pcc workers only has a subset of hosts.
  • auto: by adding the bare word auto PCC will look at the files submited in the gerrit change and apply the equivilent of C:$class or R:$resource for each class or resource altered in the change
  • basic: using the bare word basic pcc tries to pick a randome host from production (P:sretest) and a host from wmcs (hostname -f)

With the simplified cumin syntax we try to select a set of hosts which covers all unique uses cases of the class, profile and role to avoid preforming the same test on multiple nodes. We select this reduced set of hosts based on the host prefix i.e. mw, cp, db etc and they set of puppet tags applied to the host. Check the code for further details. If you pass a cumin query you will receive all hosts in the pcc worker puppetdb which match that query. e.g. at the time of writing if one used

  • P:idp: The test will run on idp1001.wikimedia.org and idp-test1001.wikimedia.org
  • cumin:P:idp: The test may run on all of idp1001.wikimedia.org, idp2001.wikimedia.org, idp-test1001.wikimedia.org and idp-test2001.wikimedia.org

Gerrit integration

There is an experimental feature which allows users to specify the list_of_node in the gerrit commit message. To do this you need to specify your list_of_nodes using the keyword Hosts: followed by your list of hosts or one of the supported overrides listed above.

Make sure the list of hosts is part of the footer section (extra new line after the main text, together with Bug: and Change-id: and other footers). Commit message validator also has a maximum line width limit, so use multiple Hosts: lines if needed. Example commit message.

You can provide comments at the end of Hosts: lines to help identify the sets of machines:

Hosts: pc1009.eqiad.wmnet,pc2009.codfw.wmnet # pc3
Hosts: es1027.eqiad.wmnet,es2028.codfw.wmnet # es4

Once this is in place you can comment on your change with check experimental and zuul will schedule a PCC using the correct gerrit ID and the hosts specified

Updating nodes

A recent update means there is now a mechanism for puppet masters to automatically send their facts data to the compiler hosts. Configured puppet masters send facts to the puppet compiler db host (pcc-db1002.puppet-diffs.eqiad1.wikimedia.cloud) using the upload_puppet_facts systemd timer. The DB host processes facts on a daily basis using the pcc_facts_processor systemd timer.

Manually update production

One can manually update the production facts by running the following:

$ ssh puppetmaster1001.eqiad.wmnet sudo /usr/local/sbin/puppet-facts-upload --proxy http://webproxy.eqiad.wmnet:8080
$ ssh pcc-db1002.puppet-diffs.eqiad1.wikimedia.cloud sudo -u jenkins-deploy /usr/local/sbin/pcc_facts_processor

Manually update cloud

Projects that use the shared puppet master can update their facts by running the following commands:

$ ssh cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud sudo /usr/local/sbin/puppet-facts-upload
$ ssh pcc-db1002.puppet-diffs.eqiad1.wikimedia.cloud  sudo -u jenkins-deploy /usr/local/sbin/pcc_facts_processor

Projects that have their own puppet master will first need to add the public key of the puppet master to puppet to ensure that the db server can accept uploads. For this you will need to add something like the following to hieradata/cloud/eqiad1/puppet-diffs/hosts/pcc-db1002.yaml

puppet_compiler::uploader::realms:
  deployment-prep:  # This should be the name of the horizon project
    # The below key should be the hostname of the puppet master.
    # The value should be the content of the puppet host public key
    # cat $(sudo facter -p puppet_config.hostpubkey)
    deployment-puppetmaster04: |
      -----BEGIN PUBLIC KEY-----
      MIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEApuxohaA21d8YqF5vVEIB
      06kvvEeLYsHdge3CHBwS4JVMspoXkzVDHbjbCLXMRMAJ9xy3HbsGFcE0MSr17oF2
      YMACKUidt0nNdjTUJZ8wYYWa3YqRIfUhV7C7FDCclKw9Vj73Up1BwdJMC0/S1te9
      pfHbo6nRwJDATEA1UyxgWBmUnJmqevLUvygppYeEb6IcjPhJGRia1jnK3VzNgyW8
      vRr6dbx9qZjvoY/KNMCFRrjvIxk7QUJfwxg1ZlJ8drwkm0vgKDmIN8l4zXAdPkgf
      WPRp2lpanS0vqHHILnl1UlHHf4kM7Q3H6y8QQN1OQfx4VuQIOHX5rLb8OPMdkiA4
      NQSMpWiSzJI5uUnyZm0unzu3F8d6VSAN/kgtEMnnpKA7FVCuFThW0zQGtVHz9QQn
      jE1BodAATdGmsOR4cukdfZxtYOuYmWFQsyHmvgcYaO/LXfe4tjpllhWnvPQpz48k
      8TGvctenbQH/HSo/3yFsYKMFoFGTuyWiL68hv2Ot5ZtgmxPhtTCtoEmIajvYe8k1
      EH0CKL44wBQOUmOAlHdROwQauZsqa8bXQTMEzZ8k6lXz06lGY0frhngbR53naEnY
      C0gyRPFAn46QOzOJQgzMneMSVp7IN05i4IYW/1kiQOT7Ks22UEJyZhXYpTkTnuQ6
      2jK3v7JNqnd3yHHg/iCdroUCAwEAAQ==
      -----END PUBLIC KEY-----

Once the above has been updated you can add role::puppetmaster::standalone::upload_facts: true to the hieradata of the project puppetmaster to enable uploads. Then you can run the commands at the beginning of this section to upload the first batch of data

Purging nodes

There are a few different things involved to completely purge a node from PCC and it depends a bit on which host variable override you use. the main things at play are:

  1. puppetdb used by compiler-update-facts:
    • the compiler-update-facts script uses the puppetdb API to export a list of active nodes, a node is considered active if it has submitted a report to puppetdb in the last 14 days (this is the production value at least)
  2. when you last ran compiler-update-facts
    • compiler-update-facts exports the node data from puppetdb then rsync's it using --delete to /var/lib/catalog-differ/puppet/yaml/ this purging any old nodes
  3. The puppetdb used by the compiler
    • the puppet compiler also has a puppetdb instance which expires nodes after 7 days of inactivity, however there are cron jobs to make sure this is fresh which run every nigh. i.e. unless manually purged nodes won't get removed from here until 7 days after step 2

Depending on the host variable override you use affects which of theses constraints you will hit.

  • if using an empty Hosts list then PCC calculates the hosts from the site.pp file so it shouldn't have any issues with old nodes (as long as they have been removed from site.pp). however you may need to run compiler-update-facts for new hosts
  • If providing a an explicit list of hosts then again none of this matters but you may need to run compiler-update-facts for new hosts
  • if using the re: selector then PCC scans /var/lib/catalog-differ/puppet/yaml/ looking for hosts matching the regex as such the actions up to step 2 would be required to purge the node
  • if using any of the other selectors then PCC queries puppetdb for matching hosts as such you need to complete all steps to purge the node

Catalog compiler for CloudVPS

The standard Jenkins-hosted catalog compiler can now target VPS instances. Because VMs are frequently created and deleted, it may be necessary to update the facts from whatever puppetmaster is hosting the VM in question. Instructions for doing that can be found at Nova Resource:Puppet-diffs.

The hostname to use for the VM is whatever the puppetmaster thinks a host is called, which is usually the output of hostname -f.

Catalog compiler local run (pcc utility)

There is a also a tool called pcc under the operations/puppet/utils repo. You'll need your Jenkins API token to make it work-- retrievable under https://integration.wikimedia.org/ci/user/$YOURUSERNAME/configure.

Example:

$ ./utils/pcc GERRIT_CHANGE_NUMBER LIST_OF_NODES --username YOUR_USERNAME --api-token 12312312312312313  
$ ./utils/pcc 282936 oxygen.eqiad.wmnet --username batman --api-token 12312312312312313

--username and --api-token can be omitted if JENKINS_USERNAME and JENKINS_API_TOKEN are set in the current environment.

Troubleshooting

Some common errors and mistakes.

  • Catalog for Cloud VPS instances doesn't get any classes/roles.

This happens because $::realm is not set to labs. There are patches in place to fix this, but the puppet-compiler software needs to be released with these patches.

  • ERROR: Unable to find facts for host tools-services-01.tools.eqiad1.wikimedia.cloud, skipping

If running locally, collect facts by hand from the corresponding puppetmaster. If running in the Jenkins web service for a production host, follow these instructions.

  • CRITICAL: Build run failed: [Errno 28] No space left on device

Nova Resource:Puppet-diffs/Documentation#Out of disk space

Limitations

The puppet-compiler mechanism won't discover all the issues in the resulting catalog. If the catalog was compiled OK by Jenkins, you may still find some issues when running the puppet agent.

Some known limitations:

  • Files sources. When declaring a File { '/my/file':, the path information you specified in the content parameter will be resolved at puppet agent runtime.
  • Private Hiera lookups. The way Hiera fetches data may vary between how it's done in the puppet-compiler process to how it's done in the final puppet master. Specifically, secrets in the private repo.
  • Hiera behavior. Currently, we don't have a way to know in concrete how Hiera is behaving when compiling the catalog. See Phabricator ticket T215507 for more information.


Communication and support

Support and administration of the WMCS resources is provided by the Wikimedia Foundation Cloud Services team and Wikimedia movement volunteers. Please reach out with questions and join the conversation:

Discuss and receive general support
Stay aware of critical changes and plans
Track work tasks and report bugs

Use a subproject of the #Cloud-Services Phabricator project to track confirmed bug reports and feature requests about the Cloud Services infrastructure itself

Read stories and WMCS blog posts

Read the Cloud Services Blog (for the broader Wikimedia movement, see the Wikimedia Technical Blog)

See also