Kubernetes/Add a new service

From Wikitech

All steps below assume you want to deploy a new service named service-foo to the clusters of the main (wikikube) group.

The ml-serve and dse-k8s groups are modelled closely on the main clusters, so many of the steps below are also applicable to these groups. Where there are specific differences they may be highlighted in the steps below.

Accessibility of service-foo from outside of Kubernetes can be achieved via Kubernetes Ingress or LVS. The method of choice will have some impact on the following steps.

Prepare the clusters for the new service

Tell the deployment server how to set up the kubeconfig files.

This is done by modifying the profile::kubernetes::deployment_server::services hiera key (hieradata/common/profile/kubernetes/deployment_server.yaml) as in the example below:

profile::kubernetes::deployment_server::services:
  main:
    mathoid:
      usernames:
        - name: mathoid
        - name: mathoid-deploy
...
+    service-foo:
+      usernames:
+        - name: service-foo
+        - name: service-foo-deploy

Please note that the file permissions of your kubeconfig file (/etc/kubernetes/service-foo-<cluster_name>.config) are inherited from the defaults at profile::kubernetes::deployment_server::user_defaults. Typically you won't need to override them. If you do need to, you can specify the keys owner, group and mode for each element in the usernames array.

Warning: Make sure to apply the puppet changes (puppet-merge and then the appropriate run-puppet-agent as detailed below) before running helm/helmfile. Otherwise things might look okay at first sight but end up in a broken state.

Add a Kubernetes namespace

Namespaces are used to isolate kubernetes services from each other.

In order to create a new namespace, prepare a change to the relevant values file in the the deployment-charts repo.

i.e. for the wikikube clusters this is: helmfile.d/admin_ng/values/main.yaml but namespaces for the ml-serve, dse-k8s, and aux-k8s cluster groups are managed in their own files. Here is an example commit for adding a namespace to the wikikube clusters.

At this point, you can safely merge the changes (after somebody from SRE/Service_Operations validates).

After merging, it is important to deploy your changes to avoid impacting other people rolling out changes later on.

Deploy changes to helmfile.d/admin_ng

The following example shows how to deploy these changes to the wikikube clusters. If you are working with a different cluster group, substitute the relevant environment names.

ssh to deploy1002 and then run the following:

sudo run-puppet-agent
sudo -i
cd /srv/deployment-charts/helmfile.d/admin_ng/
helmfile -e staging-codfw -i apply
# if that went fine
helmfile -e staging-eqiad -i apply
helmfile -e codfw -i apply
helmfile -e eqiad -i apply

The command above should show you a diff in namespaces/quotas/etc.. related to your new service. If you don't see a diff, ping somebody from the Service Ops team! Check that everything is ok:

sudo -i
kube_env admin staging-codfw
kubectl describe ns service-foo

You should be able to see info about your namespace.

Remember to deploy to staging-eqiad, eqiad and codfw clusters even if you aren't ready to fully deploy your service.
Leaving undeployed things will impede further operations by other people.

Create certificates (for the services proxy)

Manual creation/management of certificates is no longer required as of task T300033. Automatic cert management is enabled by default.

Add private data/secrets (optional)

Ask Service Ops to add the private data for your service.

This is done by adding an entry for service-foo under profile::kubernetes::deployment_server_secrets::services in the private repository (hieradata/role/common/deployment_server/kubernetes.yaml). Secrets will most likely be needed for all clusters, including staging.

Setting up Ingress

This is only needed if service-foo should be accessed via Ingress.

Follow Ingress#Add a new service under Ingress to create Ingress related config, DNS records etc.

Set resource requests and limits for your service/containers

Your chart probably comes with some default settings regarding resource usage. Please benchmark your application and change the defaults accordingly. See Resource requests and limits for some background.

Deploy the service

At this point you should have a a Chart for your service (Creating a Helm Chart), and will need to setup a helmfile.d/services directory in the operations/deployment-charts repository for the deployment. You can copy the structure (helmfile.yaml, values.yaml, values-staging.yaml, etc.) from helmfile.d/services/_example_ and customize as needed.

If this service will be accessed directly via LVS (no ingress): Ensure the service has its ports registered at Service ports ($SERVICE-FOO-PORT)

You can proceed to deploy the new service to staging for real.

On deploy1002:

cd /srv/deployment-charts/helmfile.d/services/service-foo
helmfile -e staging -i apply

The command above will show a diff related to the new service, make sure that everything looks fine and then hit Yes to proceed.

Testing a service

  1. Now we can test the service in staging. Use the very handy endpoint: https://staging.svc.eqiad.wmnet:$SERVICE-FOO-PORT to quickly test if everything works as expected.

Deploy a service to production

  1. Ensure you have enabled TLS support via tls.enabled in your values.yaml
  2. Then the final step, namely deploying the new service. On deploy1002:
    cd /srv/deployment-charts/helmfile.d/services/service-foo
    helmfile -e codfw -i apply
    # if that went fine
    helmfile -e eqiad -i apply
    

The service can now be accessed via the registered port on any of the kubernetes nodes (for manual testing).

Monitor the Service

Copy this template to a new one named after your new service, and edit accordingly. Please do not edit the original template!


Setting up LVS

This is only needed if service-foo should be accessed via LVS.

Follow LVS#Add_a_new_load_balanced_service to create a new LVS service on $SERVICE-FOO-PORT.

Add in Service Mesh

If other services will be reaching your new service via the service mesh (aka via envoy), then this service will need an entry in the services proxy listeners list in https://gerrit.wikimedia.org/g/operations/puppet:

hieradata/common/profile/services_proxy/envoy.yaml:

profile::services_proxy::envoy::listeners:
 # First, the discovery enabled services
 - name: parsoid-php
   port: 6002
   timeout: "30s"
   service: parsoid-php
   keepalive: "4s"
   retry:
     retry_on: "5xx"
     num_retries: 1
   
<snip>
# default listeners list used by the MW installations
profile::services_proxy::envoy::enabled_listeners:
 - parsoid-php