User:Jbond/Encryption
This page attempts to document how encryption is utilised in the Foundation production environment i.e. service to service
Puppet CA
The vast majority of services utilising encryption do so over TLS and often make use of the certificates issued by the Puppet CA. The puppet CA is [ab]used as we know that every server in the production environment already has a certificate issued from a centrally managed location. This makes it easy to issue and revoke certificates however it also make it more difficult to compartmentalise different security domains.
Services wishing to support client authentication with the certificates issued via the puppet CA will need to add the Puppet CA public certificate to there listed of trusted root certificates. Puppet ensures that the CA public certificate is available on all nodes at the following locations
- /var/lib/puppet/ssl/certs/ca.pem i.e.
sudo puppet config print localcacert
. - /usr/local/share/ca-certificates/Puppet_Internal_CA.crt
- /etc/ssl/certs/Puppet_Internal_CA.pem (symlink managed by update-ca-certificates)
Clients preforming TLS authentication will need to be configured with the public and private client certificates issued by puppet. The file content of theses files is not managed by the puppet repo but instead issued and stored on the server when it is first imaged. you can use the following commands to show the location of the public and private certificates.
puppet config print hostcert
puppet config print hostprivkey
however these locations are only accessible to the puppet
and root
users. As such it is common for services to copy theses files to there own configuration folder via base::expose_puppet_certs
Services
Backups (bacula)
bacula clients are configured to use TLS with the puppet agent certificates exposed via base::expose_puppet_certs. Further they are configured to to verify their peer. The backup server is also configured with TLS however they are configured not to verify the peer however connections are also secured with a password.
Debmonitor
The debmonitor server is available via https://debmonitor.wikimedia.org/ using the global sign wildcard certificate. This is served via the caching infrastructure which fetches the actual content from https://debmonitor.discovery.wmnet, this endpoint is configured with a certificate issued via cergen. This site is configured to perform SSL client verifiication and sets appropriate headers which are later used by the django application. On the client side the puppet agent certificates are exposed using base::expose_puppet_certs
debmonitor-client
and configured via /etc/debmonitor.conf
Kafka Broker
The kafka brokeer is configured to use a certificate generated by cergen much like the sslcert::certificate
list below. However kafka is a java daemon and as such needs the java keystore file which is not currently supported by sslcert::certificate
. Therefore the profile::kafaka::broker
class takes care of copying the java keystore file from the private repo. Kafka is also configured to request SSL client auth trusting certificates signed by the Puppet CA. The following clients make use of the kafka broker
- varnishkafka via
profile::cache::kafka::webrequest
andprofile::cache::kafka::eventlogging
which both includeprofile::cache::kafka::certificate
- rsyslog
kubernetes
The kube-apiservice
service is configured with a certificicate genrated by cergen and copied into place using sslcert::certificate
. The kubernetes clients are also configured with SSL client certificates issued via puppet and copied into place using the k8s::ssl
class instead of the expose_puppet_certs resource
. However the kube api server is not configured with client-ca-file
so it is unclear if client authentication is being preformed
Rsync Stunnel
The rsync class has an option to wrap rsync commands in an stunnel. If this is enabled then the server will be configured with an stunnel daemon which proxies commands to the rsync process. the cron job or systemd time used to actually rsync the data will also need to be configured as an stunnel client
Rsyslog
Rsyslog runs on every machine and sending logs to the syslog::centralserver
role. Both the central server and the sending nodes are configured to use the puppet agent certificates to establish a TLS connection and use the puppet CA as the trusted authority
We also have a number of services configured with standard SSL server certificates. As theses certificates are signed by an internal CA they are not for public consumption and often not utilised by humans. The most common use of theses certificates is to allow for the caching and LVS layers to securely talk with the backend services. Services wishing to have a Server certificate will first need to generate the certificate using cergen.
Cergen automatically copies files into the the private repo under /srv/private/modules/secret/secrets/certificates
. however sslcert::certificate
expects the public certificate to exist in the public puppet-repo in the files directory and the private key to exist in the private repo under the ssl folder
Services
sslcert::certificate
sslcert::certificate
is a puppet resources used by many services to copy a certificate from the puppet private repo. In the majority of cases the certificate copied is one generated with cergen however it could also be a commercial certificate such as the wildcard globalsign certs. Theses certificates are often used to provide SSL endpoints for internal services. The following is a list of services which use the sslcert::certificate
directly.
- etcd:
_etcd-server-ssl._tcp.svc.${::site}.wmnet
- Ganeti:
ganeti01.svc.${::site}.wmnet
- Swift:
ms-fe01.svc.${::site}.wmnet
- profile::docker_registry_ha::registry
docker-registry.discovery.wmnet
- profile::openldap_corp & profile::docker::registry
ldap-corp.${::site}.wikimedia.org
profile::tlsproxy::envoy
profile::tlsproxy::envoy
has similar functionality to tlsproxy::localssl however but uses Envoy instead of nginx. It uses sslcert::certificate
to distribute certificates and is intended as a replacement to tlsproxy::localssl. The following roles us this profile
- etherpad -
etherpad.discovery.wmnet
- grafana -
grafana.discovery.wmnet
- graphite::production -
graphite.discovery.wmnet
- logstash -
kibana.discovery.wmnet
- logstash7 -
kibana.discovery.wmnet
- mediawiki::mainenance -
mwmaint.discovery.wmnet
- microsites::peopleweb -
peopleweb.discovery.wmnet
- otrs -
ticket.discovery.wmnet
- phabricator -
phabricator.discovery.wmnet
- planet -
planet.discovery.wmnet
- puppetboard -
puppetboard.discovery.wmnet
- releases -
releases.discovery.wmnet
- restbase::production -
restbase.discovery.wmnet
- requesttracker -
rt.discovery.wmnet
- wdqs
wdqs.discovery.wmnet
- webperf::processors_and_site -
performance.discovery.wmnet
- webserver_misc_apps -
webserver-misc-apps.discovery.wmnet
- webserver_misc_static -
webserver-misc-static.discovery.wmnet
tlsproxy::localssl
tlsproxy::localssl
is used to create an TLS termination revers proxy to a service listening on localhost. It is often used to configure backend services which will be fetched by the caching layer. In this case it uses the sslcert::certificate resource to distribute a certificate created by cergen and signed by the puppet CA. however it is also used to distribute certificates created by Acme-chief which are signed by the lets encrypt CA. The following services use tls::proxy
- sslcert::certificate
- maps
kartotherian.discovery.wmnet
- jobruner
jobrunner.svc.${::site}.wmnet
&jobrunner.discovery.wmnet
- videoscaler
videoscaler.svc.${::site}.wmnet
&videoscaler.discovery.wmnet
- cirrus
search.svc.codfw.wmnet
- maps
profile::cache::ssl::unified
The profile::cache::ssl::unified
class wraps tlsproxy::localssl
and is primarily used to configure the production on the caching infrastructure caching servers with the global sign wildcard certificate.
profile::mediawiki::webserver
profile::mediawiki::webserver
class wraps tlsproxy::localssl
and is used to configure TLS termination on the mediawiki servers. The following services use this class
- parsoid
parsoid.svc.${::site}.wmnet
- appserver
appservers.svc.${::site}.wmnet
- api server
api.svc.eqiad.wmnet
Apache Traffic Server (ATS)
In order to ensure Cross DC traffic is encrypted we configure the ATS servers to talk to the origin servers (applayer) via TLS. We configure proxy.config.ssl.client.CA.cert.path
to /etc/ssl/certs
and proxy.config.ssl.client.CA.cert.filename
to Puppet_Internal_CA.pem. This allows it to connect to and validate backend services such as grafana.discovery.wmnet. which exposes uses envoy for TLS termination end exposes its puppet CA certificate via sslcert::certificate
Conftool
The conftool client does not configure a CA bundle to use. As conftool is using python3-etcd which in turn uses urllib3 it effectively means that SSL validation is disabled
MariaDB
Our databases are configured with a TLS enabled endpoints using the puppet agent certificates exposed via base::expose_puppet_certs
. Our config does not specify set ssl-mode
as such it is configured with the default preferred option. This means clients will attempt to connect over TLS but will fall back to an unencrypted connection if thats not possible.
Currently TLS is used for mariadb to mariadb connections for things like replication as well as some adhoc clients such as the promethious exporter
Mcrouter
Mcrouter does not use the Puppet CA instead it has its own CA managed by cergen more information can be found on the mcrouter page. Mcrouter in eqiad establishes TLS connections to 4 Mcrouters in codfw to allow cross-dc replication when needed. Same thing happens vice-versa of course. TLS is not used from client to Mcrouter or from Mcrouter to Memcached.
Hadoop
Hadoop does not use the Puppet CA instead it has its own CA managed by cergen. The certificates issues by the hadoop CA are used to provide TLS authentication to the connections between Mappers/Shufflers/Reducers during Map-Reduce jobs.
Cassandra
Cassandra does not use the Puppet CA instead it has its own CA managed by by a python script cassandra-ca-manager (a precurser to cergen). The certificates issues by the Cassandra CA are used to provide TLS authentication to the connections between the cassandra nodes.