Acme-chief is an application resulting from the Wikimedia Hackathon 2018 that is to be used to centrally request configured TLS certificates from ACME servers, then make the public and private parts available to authorised API users.
See T235252 for how to set this up for a Cloud VPS project - particularly the service account creation subtask which needs to be performed by the cloud administrators.
In production this is already set up to manage production DNS, most people probably just want to know to find the certificate configuration in the hieradata/role/common/acme_chief.yaml file in operations/puppet.git.
If acme-chief is having issues, you should also check the Let's Encrypt status page to make sure it isn't having an outage or maintenance.
Acme-chief production environment is composed of one active instance and at least one passive instance. The hiera key
profile::acme_chief::active flags an instance an active while passive instances are listed in an array called
The active instance is responsible of running both the acme-chief service
acme-chief.service and the puppet file API service
uwsgi-acme-chief.service && nginx.service while a passive instance only idly runs the puppet file API service.
TLS material is synchronized between instances using a one shot systemd service
acme-chief-certs-sync.service triggered by a systemd timer every 30 minutes. This service is also run on the active instance.
Replacing the active instance
- Create the new Ganeti VM.
- Set its role as
acme_chiefon site.pp and make sure that's listed as a passive instance on
- Once the new instance is up and runnning arm keyholder for SSH access
- trigger a puppet run on the current active instance and trigger the TLS material sync service:
vgutierrez@cumin1001:~$ sudo -i cumin 'acmechief-test1001.eqiad.wmnet' 'systemctl start acme-chief-certs-sync.service'
- Upon completion the new instance should have a current copy of the TLS material on
- Disable puppet on the old active instance and on every acme-chief client
sudo -i cumin 'R:acme_chief::cert' "disable-puppet 'acmechief maintenance'"
acme-chief.serviceon the old active instance
- Set the new instance as active in
profile::acme_chief::activeand remove it from
- Run puppet on the new active instance. After puppet is done
acme-chief.serviceshould be up and running in the new instance.
- Re-enable puppet on the acme-chief clients:
sudo -i cumin 'R:acme_chief::cert' "enable-puppet 'acmechief maintenance'"
- Decommission the old instance using the sre.hosts.decommission cookbook