Pipeline admin

From Wikitech
Jump to navigation Jump to search

Cluster overview

We have 3 clusters currently, one per dc in codfw and eqiad and another one acting as an staging cluster. Each cluster has one or more controllers node (where the kubernetes apiserver and controllers lives) and two or more workers (kubelets), each cluster has an associated etcd cluster where the api and calico stores their configuration.

Cluster Component Version
staging kubernetes 1.12.X
staging calico 2.2.0
staging etcd cluster

(kubestagetcd100[1-3])

2.2.1
staging helm 2.12
eqiad kubernetes 1.12.X
eqiad calico 2.2.0
eqiad etcd cluster

(kubestagetcd100[1-3])

2.2.1
eqiad helm 2.12
codfw kubernetes 1.12.x
staging calico 2.2.0
staging etcd cluster

(kubestagetcd100[1-3])

2.2.1
staging helm 2.12


All clusters are configured via puppet and we build our own debian packages (we do not use the provided ones). Cluster and service applications are managed via Helm, and specifically via helmfile as a Helm management tool.

Cluster features

Ingress

We don't have currently an ingress controller installed in the cluster, allowing external traffic into the cluster involves the creation of a NodePort and setting up LVS configuration for a new service. This way pooling and depooling services running in the cluster is done in the same way as services running outside the cluster.

Admission Controllers

The list of enabled controllers by cluster is kept in hiera files in Puppet ( [1] [2] ).

Scaling

Cluster scaling is done manually when new nodes are requested, provisioned and installed. Applications autoscaling is not enabled yet, in order to make that work metrics-server should be installed and metrics-servers requires setting up the API agreggation layer in the cluster.

In-cluster DNS.

CoreDNS has been packaged and enabled to be used in the cluster. Note that since scaling is not enabled we should run enough replicas to ensure a DNS request spike.

Cluster Upgrades.

Minor upgrades

Upgrading a minor version, from 1.12.1 to 1.12.2 should be safe, it will involve generating new packages, uploading it to the debian repository and after that upgrading first the control plane and later the nodes, draining node first, upgrading and uncordon it later.

Major upgrades

Major upgrades are either straightforward or really hard to do. Any new major upgrade should be tested first in staging, after some days of testing evaluate if the upgrade should move forward into production.


For some upgrades, like moving from 1.12 to 1.13, it might be easier to create a new cluster and reinstall applications from the state stored in helmfile.

Authentication and authorization

We use tokens for authentication, tokens are kept under the private repo and populated in kubeconfig files on the deployment servers. Others methods of authentication like LDAP (via webhook), OAuth are not actively considered for security and practical considerations so we are going to stick with tokens for now.

Authorization is granted via RBAC, RBAC rules are simple enough:

  • Service users can only do read only operations and port-forward. This users are the ones that service owners uses, and their privileges are granted through the deploy rolebinding.
  • There are tiller users that can deploy into a given namespaces. These privileges are granted through the tiller rolebinding.
  • Admins can do anything in any namespace. Only serviceops team can do admin actions.

Namespace creation policy.

Any new service or application should live in its own namespace. There are exemptions to this rule, multiple copies of the same service could be run at the same time, for instance in order to do canary releases, in such cases is acceptable to run more than one service per namespace.

Take into account that some limits, cluster resource limits, and authorization models are bounded to a given namespace.


Namespace creation is managed by the Service Operations team, if you need a new service create a service request task under https://phabricator.wikimedia.org/project/profile/1305/ (use the link at the bottom of the description) and follow the steps to be onboarded in the cluster.

User services contract

Services should fulfill the following contract:

  • Metrics, should be exposed in prometheus format under the common /metrics endpoint. If your service uses https://github.com/wikimedia/service-runner and the scaffold version of a helm chart for the application you should be covered and everything should work out of the box, since prometheus-statsd-exporter will export such metrics.
  • Logs, your application should log whatever is important for debugging into stdout. Logs would be collected from containers stdout and agreggated in Kibana available to be queried. This is not yet possible.
  • Readiness and liveness endpoints. Your application should expose a /healthz endpoint that should signal if your service is healthy or not. If application is healthy it should return a 200 otherwise it should return and error code. You should also present a /ready endpoint to signal whether your application is ready to handle more traffic or not, having it will help to not send more traffic while keep processing the internal request queue.

Managing applications in the cluster

Cluster applications (like CoreDNS, metrics-server and other) and user applications are installed and managed via helm in the clusters, in order to manage releases, a set of helm charts and value files, we use helmfile as a management tool and helm wrapper. We keep the charts and helmfile definitions in the deployment-charts repo .

  • charts/ includes the chart definitions and our own chart repository served under https://releases.wikimedia.org/charts/ you can use the create_new_service.sh if you want to add a new service or import and adapt and existing chart.
  • helmfile.d/ includes the helmfile definitions and uses the aforementioned chart to define releases. Each release is locate under admin/ for cluster wide applications and configuration or services/ for tenant definitions.

Any developer can submit CRs to the repo and merge it.

Code deployment/configuration changes

Note that both new code deployments as well as configuration changes are considered a deployment!

  1. Clone deployment-charts repo.
  2. Using your editor modify under the helmfile.d folder of the service you want to modify. As an example, myservice deployment on staging cluster lives under deployment-charts/helmfile.d/services/staging/myservice. Most of the changes are usually made on the values.yaml file to tune the deployment parameters.
  3. If you need to update or add a secret like a password or a certificate commit it into the private puppet repo do not commit secrets in deployment-charts repo.
  4. Make a CR and after a successful review merge it.
  5. After merge, log in in a deployment server, there is a cron (1 minute) that will update the /srv/deployment-charts directory with the contents from git.
  6. Go to /srv/deployment-charts/helmfile.d/services/${CLUSTER}/${SERVICE} where CLUSTER is one of (staging,eqiad,codfw) and SERVICE is the name of your service i.e myservice. Go to /srv/deployment-charts/helmfile.d/admin/${CLUSTER} for admin services and admin operations if you are a cluster admin.
  7. execute source .hfenv; helmfile diff . This will show the changes that it will be applied on the cluster.
  8. execute source .hfenv; helmfile apply . This will materialize the previous diff in the cluster and also will log into SAL the change.
  9. all done!

In case there are multiple releases of your service in the same helmfile, you can use the --selector name=RELEASE_NAME option, e.g. source .hfenv; helmfile --selector name=test status.

Seeing the current status

This is done using helmfile

  1. Change directory to /srv/deployment-charts/helmfile.d/services/${CLUSTER}/${SERVICE} on a deployment server
  2. Unless you are mid un-applied changes the current values files should reflect the deployed values
  3. You can check for unapplied changes with: source .hfenv; helmfile diff
  4. You can see the status with source .hfenv; helmfile status

Rolling back changes

If you need to roll back a change because something went wrong:

  1. Revert the git commit to the deployment-charts repo
  2. Merge the revert (with review if needed)
  3. Wait one minute for the cron job to pull the change to the deployment server
  4. Change directory to /srv/deployment-charts/helmfile.d/services/${CLUSTER}/${SERVICE} where CLUSTER is one of (staging,eqiad,codfw) and SERVICE is the name of your service
  5. execute source .hfenv; helmfile diff to see what you'll be changing
  6. execute source .hfenv; helmfile apply

Rolling back in an emergency

If you can't wait the one minute, or the cron job to update from git fails etc. then it is possible to manually roll back using helm. This is discouraged over using helmfile though.

  1. Find the revision to roll back to
    1. source .hfenv; helm history <production> --tiller-namespace YOUR_SERVICE_NAMESPACE
    2. Find the revision to roll back to
    3. e.g. perhaps the penultimate one
      REVISION        UPDATED                         STATUS          CHART           DESCRIPTION     
      1               Tue Jun 18 08:39:20 2019        SUPERSEDED      termbox-0.0.2   Install complete
      2               Wed Jun 19 08:20:42 2019        SUPERSEDED      termbox-0.0.3   Upgrade complete
      3               Wed Jun 19 10:33:34 2019        SUPERSEDED      termbox-0.0.3   Upgrade complete
      4               Tue Jul  9 14:21:39 2019        SUPERSEDED      termbox-0.0.3   Upgrade complete
      
  2. Rollback with: source .hfenv; helm rollback <production> 3 --tiller-namespace YOUR_SERVICE_NAMESPACE

Why do I need to source that file?

That is a temporary workaround for helmfile and helm-diff not honoring --helm-home and --kubeconfig flags the moment this tools uses that flags we are going to remove it.

Advanced use cases: using kubeconfig

If you need to use kubeconfig (for a port-forward or to get logs for debugging) you can go to /srv/deployment-charts/helmfile.d/services/${CLUSTER}/${SERVICE} where CLUSTER is one of (staging,eqiad,codfw) and SERVICE is the name of your service i.e myservice. And execute source .hfenv; kubectl COMMAND

Advanced use cases: using helm

Sometimes you might need to use helm, this is completely discouraged use it only at your own risk and in emergencies.

  • go to /srv/deployment-charts/helmfile.d/services/${CLUSTER}/${SERVICE} where CLUSTER is one of (staging,eqiad,codfw) and SERVICE is the name of your service i.e myservice
  • source .hfenv; helm COMMAND

Adding a new namespace and service.

If you want to create a new namespace, per the policy it means adding a new service, you should do the following steps:

  • You should create an stanza in the Puppet repo, that will create the environment for your service (among other things the .hfenv file we are using almost all the time). Remember that you also need to generate a token and commit it in the private repo.
  • Add a values yaml file in each environment (staging,eqiad,codfw) like this one, including the name of the namespace, the quota and the limits for the namespace.
  • Add an entry in the envs.yaml file in each environment referring to the just created values.yaml, use the same name as the namespace name for sanity.
  • Commit the changes and prepare a CR for the deployment-charts repo.
  • After merge, go to a deployment server, go to /srv/deployment-charts/helmfile.d and run ./admin/{staging,codfw,eqiad}/cluster-helmfile.sh diff and if the diff seems correct apply it.
  • After the apply command the new namespace should have been created with all the required pieces (a running tiller, rolebindings, quotas etc).

After the namespace creation, you can apply the new application as described in the managing applications section above.

Admin services.

Admin services are special applications meant to provide cluster services, these applications are managed like normal application but they live under the helmfile.d/admin/$CLUSTER/adminservice and they are usually installed in the kube-system namespace or a dedicated namespace for them.

Managing secrets for applications.

If you need to add secrets like certificates, you should commit them in the private repo. There is puppet code that reads these hiera files and generate the appropriate files in the appropriate folder. Private data are kept in a ./private/values.yaml file for any application (including admin services).


Helmfile: common errors

If you want to check out the status of a helmfile release you can execute helmfile status command in the right directory, for admin operations it probably would be more insightful to run helm list, a command that will output a table with the status of all releases.

Common error #1: Error: UPGRADE FAILED: "foo" has no deployed releases

This happens when you want to upgrade over a failed helm release, so that means that first attempt did not succeed and you are try to upgrade from a failed one and helm will refuse to do it. Using helm list will show you the failed release and the failure reason. If this is the first revision of a release doing a helm del --purge release should clean up state and then you can retry the installation. If what failed was an upgrade you can always find out a working revision with helm history release and then rollback to the last working one using helm rollback release revision before trying again the upgrade.

Common error #2: EOF

This usually means that tiller is taking more than expected, caused by an apiserver lock and maybe and underneath etcd slow operation. This is an indicator of the apiserver suffering a heavy load or more probably that you are doing too many helm operations at once.

Initializing a new cluster

If you create a brand new cluster and wants to manage it using helmfile, you will need to initialize the cluster creating a cluster scope tiller. There is an script in deployment-charts/helmfile.d which does precisely that and you can call it like ./initialize_cluster.sh kube-system kubemaster.svc.eqiad.wmnet 6443

that will initialize the cluster pointed to the current kubeconfig to deploy a tiller in the kube-system namespace pointing the apiserver host to kubemaster.svc.eqiad.wmnet and the apiserver port to 6443.