Kubernetes/Enabling TLS
We use envoy to provide TLS termination functionality to services. It's installed as a sidecar in each pod and functions as a reverse proxy to the app. If you generated the chart using sextant, your chart has TLS is available and disabled by default.
Create and place certificates
For staging deployments, certificates for staging.svc.eqiad.wmnet
and staging.svc.codfw.wmnet
are provided by default. You may of cause override them if you need to. For Production, follow the steps:
- Add the relevant production certificate to puppet
- Private Repo: edit
/srv/private/modules/secret/secrets/certificates/certificate.manifests.d/kube_services.certs.yaml
where you will add a stanza for your service. It should closely mimic the existing ones and should at least have the following alt_names:
- Private Repo: edit
$SERVICE_NAME.discovery.wmnet $SERVICE_NAME.svc.codfw.wmnet $SERVICE_NAME.svc.eqiad.wmnet $SERVICE_NAME-main-tls-service.$NAMESPACE.svc.cluster.local (if you dont know what this is, you do not need it)
- Private Repo: Generate the certificates. DO NOT SET A PASSWORD. Using a password results in an encrypted key file, which envoyproxy can't use.
- Private Repo: Check the certificates we are about to generate:
$ cergen -c "$SERVICE_NAME.*" --base-path /srv/private/modules/secret/secrets/certificates /srv/private/modules/secret/secrets/certificates/certificate.manifests.d
- Private Repo: If everything looks OK, then run again adding
--generate
to create the certificate - ONLY IF YOU SET A KEY PASSWORD do the following: We need the unencrypted key, create it with
openssl ec -in modules/secret/secrets/certificates/$CERT_NAME/$CERT_NAME.key.private.pem -out modules/secret/secrets/certificates/$CERT_NAME/$CERT_NAME.key.private.unencrypted.pem
. You will be required a password (that you set up in cergen)
- Private Repo: If everything looks OK, then run again adding
- Private Repo: Commit all the generated files to git
- Private Repo:
/srv/private/hieradata/role/common/deployment_server/kubernetes.yaml
edit and add appropriate stanza there, for all production environments:
profile::kubernetes::deployment_server_secrets::services:
blubberoid:
eqiad:
tls: &blubberoid_certs
certs:
# NOTE: If you set a password, use the $CERT_NAME.key.private.unencrypted.pem file you created instead.
key: "secret(certificates/$CERT_NAME/$CERT_NAME.key.private.pem)"
cert: "secret(certificates/$CERT_NAME/$CERT_NAME.crt.pem)"
codfw:
tls: *blubberoid_certs
...
- Private Repo: commit all your changes
- Run puppet on the deployment hosts, verify the data that gets written to the
/etc/helmfile-defaults/private/$SERVICE_NAME/{staging,eqiad,codfw}.yaml
is correct - Add the rest of the configuration for tls enablement in deployment-charts under
helmfile.d/services/$SERVICE_NAME/values*.yaml
- Happy helming!
Deploy your chart
- Define a proper
upstream_timeout
invalues.yaml
for envoy to use. Current default is 60
- Run `
helmfile
- check Kubernetes/Deployments#Cheatsheet
Enable the TLS support
Set mesh.enabled
to true under `helmfile.d`, perhaps per cluster if needed.
Make the service visible
Follow: Kubernetes/Ingress#Add a new service under Ingress
Outdated Documentation
Create a new LVS service for TLS enabled service
Follow LVS#Add_a_new_load_balanced_service to create a new LVS service on your newly chosen port, but on the same LVS IP as the previous one.
Switch traffic, aka switch configuration of dependent services to use the new LVS service
Things that might need to be changed:
- mediawiki-config https://gerrit.wikimedia.org/r/#/admin/projects/operations/mediawiki-config
- caching proxies configuration https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/hieradata/common/profile/trafficserver/backend.yaml
Things to be mindful of:
- CPU and memory limits of the envoy sidecar container when more traffic starts hitting the new LVS service.
Remove the old LVS service
For this we use the inverse process than the creation of the new LVS service. There is a runbook already at LVS#Remove_a_load_balanced_service
Things to be mindful of:
- Make sure that no traffic goes to the old service
- Alerts are scheduled downtime in icinga
Decommission the non-TLS service from helm chart
The non-TLS service (template) may now be removed from the helm chart es well (freeing a nodePort).