Portal:Toolforge/Admin/Kubernetes/Etcd (deprecated)

From Wikitech
This documentation is entirely deprecated as of the shutdown of the Kubernetes 1.4 cluster. It is preserved for history only.

We will update or replace this document soon.

Introduction

etcd is a distributed reliable key-value store for critical data of a distributed system. It is:

  • Simple: well-defined, user-facing API (gRPC)
  • Secure: automatic TLS with optional client cert authentication -- we are relying on firewall rules instead at this time.
  • Fast: benchmarked 10,000 writes/sec
  • Reliable: properly distributed using Raft

Etcd is a component of Kubernetes and the Toolforge Proxy system (via the network overlay, flannel)

Kubernetes stores persistent state in a etcd cluster - all other components are stateless. The etcd cluster is only accessed directly by the API Server and no other component. Direct access to this etcd cluster is equivalent to root on the entire k8s cluster, so it is firewalled off to only be reachable from the instance running the k8s 'master' (aka rest of the control plane).

We currently use a 3-node cluster: tools-k8s-etcd-0[1-3]. They're all smallish Debian Jessie instances configured by the same etcd puppet code we use in production.

Flannel, which acts as the network overlay for k8s is run out of a second etcd cluster (prefix tools-flannel-etcd-) that is connected to by servers other than k8s as well as to the Kubernetes nodes (previously the bastions and now possibly only the proxy systems). This allows the proxies to act as a primitive sort of ingress controller for Toolforge.

Cluster Management

Health Monitoring

The etcdctl tool can be used to check the cluster-side health information. When invoked, it will attempt to connect to all of the listed members of the cluster, gather the health status and display the overall health result to you:

Login to one of the etcd servers via SSH (e.g. tools-k8s-etcd-01)

$ ssh tools-k8s-etcd-01.tools.eqiad1.wikimedia.cloud

Execute etcdctl with the arguments below.

$ etcdctl --timeout "30s" -C https://tools-k8s-etcd-01.tools.eqiad1.wikimedia.cloud:2379 cluster-health

If the cluster is health, you will see a very similar output below:

member 6d1cfa48660001 is healthy: got healthy result from https://tools-k8s-etcd-01.tools.eqiad1.wikimedia.cloud:2379
member 500eb342ea73b85e is healthy: got healthy result from https://tools-k8s-etcd-03.tools.eqiad1.wikimedia.cloud:2379
member 821778dbbc5672b1 is healthy: got healthy result from https://tools-k8s-etcd-02.tools.eqiad1.wikimedia.cloud:2379

At the working level, etcd service exposes health information via HTTP/s at the /health URI in JSON format. If the return string is {"health":"true"}, it indicates that the cluster is health. This is especially useful for machine health monitoring.

$ curl https://tools-k8s-etcd-01.tools.eqiad1.wikimedia.cloud:2379/health
{"health": "true"}

Overriding the Default Timeout Value

Note that we had to explicitly set the timeout to 30s.

At the time this article is written, the Cloud VPS Kubernetes cluster uses etcd version 2.2.1 which has the default connect timeout of 1s.

Unfortunately, on certain time of the day, it may take longer than 1s for etcdctl to connect to the members and therefore may output the incorrect cluster health status (e.g. displaying that the members are down instead of healthy). This is why we recommend increasing the default timeout value longer than 1s (e.g. 30s is reasonable)

Debugging

The etcd service provides a couple of ways to make cluster debugging a bit easier.

Enabling/Disabling Debug Logging

While etcd service is running, it is possible to enable logging at runtime without stopping the service. The etcd REST service exposes the logging configuration at /config/local/log (note that this endpoint is no longer valid in v3.5 and higher)

To enable debug logging on tools-k8s-etcd-01 at runtime:

$ curl https://tools-k8s-etcd-01.tools.eqiad1.wikimedia.cloud:2379/config/local/log -XPUT -d '{"Level":"DEBUG"}'
# debug logging enabled

To disable debug logging on tools-k8s-etcd-01 at runtime:

$ curl https://tools-k8s-etcd-01.tools.eqiad1.wikimedia.cloud:2379/config/local/log -XPUT -d '{"Level":"INFO"}'
# debug logging disabled

Note that you can perform on other members if needed; just replace the host tools-k8s-etcd-01 with tools-k8s-etcd-02 for example.

Starting/Stopping etcd service

On the tools-k8s-etcd-0[1-3] servers, etcd service is managed by systemcd. To start or stop the service, login to the target etcd server and invoke the systemctl utility for service state change

For example, to start the etcd service on tools-k8s-etcd-01

$ ssh tools-k8s-etcd-01.tools.eqiad1.wikimedia.cloud
$ sudo systemctl start etcd

For example, to stop the etcd service on tools-k8s-etcd-01

$ ssh tools-k8s-etcd-01.tools.eqiad1.wikimedia.cloud
$ sudo systemctl stop etcd

Logs

The etcd logs can be found in the /var/log/daemon or via the journalctl -u etcd utility.

Membership Management

If you are considering altering the membership of a running cluster, you should probably take a backup.

Adding a Member

If you try to add a node that has data in the data directory configured, it will fail! Make sure your data directory (eg. /var/lib/etcd/tools-k8s/member) is empty first if adding it back after removal.


etcdctl member add [options]

Introduces a new peer member into an etcd cluster. In general, you add the member to the cluster and THEN start etcd--this means etcd on the new node should be stopped before running this, which helps make sure the data directory is empty. This will also fail if the configuration in the new member's env variable ETCD_INITIAL_CLUSTER doesn't match what the cluster will look like when this new node is added. If you only have one node right now, that variable had best have 2. You will need to disable puppet to edit the systemd file to change this variable before starting the service.

Also, make sure that you already added the new node (if this is a new node and not a pre-existing one) to the hiera key flannel::etcd_hosts or k8s::etcdhosts, as appropriate. For Toolforge, you will find this value on wikitech. This controls the firewall. If the firewall blocks communication, adding the node will not go well once you start it.

For example, add tools-k8s-etcd-03:

$ etcdctl -C https://tools-k8s-etcd-01.tools.eqiad1.wikimedia.cloud:2379 add member --peer-urls=http://tools-k8s-etcd-03.tools.eqiad1.wikimedia.cloud:2380

Member ced000fda4d05edf added to cluster 8c4281cc65c7b112

One last warning here: Be especially careful when changing to an even number of nodes. Take a backup before you do, if in doubt.

Removing a Member

etcdctl member remove

Removes a member of an etcd cluster. For example, remove member ef37ad9dc622a7c4 from the cluster:

$ etcdctl -C https://tools-k8s-etcd-01.tools.eqiad1.wikimedia.cloud:2379 remove member ef37ad9dc622a7c4

Displaying current Member List

etcdctl member list

Displays the member details for all current members of an etcd cluster.

For example:

$ etcdctl -C https://tools-k8s-etcd-01.tools.eqiad1.wikimedia.cloud:2379 member list
6d1cfa48660001: name=tools-k8s-etcd-01 peerURLs=http://tools-k8s-etcd-01.tools.eqiad1.wikimedia.cloud:2380 clientURLs=https://tools-k8s-etcd-01.tools.eqiad1.wikimedia.cloud:2379
500eb342ea73b85e: name=tools-k8s-etcd-03 peerURLs=http://tools-k8s-etcd-03.tools.eqiad1.wikimedia.cloud:2380 clientURLs=https://tools-k8s-etcd-03.tools.eqiad1.wikimedia.cloud:2379
821778dbbc5672b1: name=tools-k8s-etcd-02 peerURLs=http://tools-k8s-etcd-02.tools.eqiad1.wikimedia.cloud:2380 clientURLs=https://tools-k8s-etcd-02.tools.eqiad1.wikimedia.cloud:2379

Data

Taking a backup

To take a backup, first shut down the etcd service with `systemctl stop etcd`. Check to make sure it is stopped (something like ps -ef | grep etcd will do). Then you need only run a fairly quick command (depending on the size of the dataset).

$ sudo etcdctl backup --data-dir /var/lib/etcd/tools-k8s --backup-dir /home/me/tools-k8s-etcd-backup/

No timeout is needed since this is a local command. I recommend using sudo to avoid permission problems. Do this before you do anything destructive like manually changing things in the data or altering the size of the cluster (especially to 2 members, which can be quite damaging).

Note that the backup data won't contain the identity of the current node (i.e. node ID and cluster ID). This prevents the new node restored from this backup from inadvertently joining onto an existing cluster. Due to this design, in order to recreate a cluster from the backup, you will need to start a new, single-node cluster.

Lifecycle

The etcd service initially store its configuration in the data directory located at /var/lib/etcd/tools-k8s. The data set is called "write ahead log" which includes: local member ID, cluster ID, and initial cluster configuration. The write ahead log and snapshot files are used during member operation and to recover after a cold restart.

If a member’s data directory is ever corrupted then you should remove the etcd member from the cluster using etcdctl tool.

You should avoid restarting an etcd member with a data directory from an out-of-date backup. Using an out-of-date data directory can lead to inconsistency as the member had agreed to store information via raft then re-joins saying it needs that information again.

For maximum safety, if an etcd member suffers any sort of data corruption or loss, it must be removed from the cluster. Once removed the member can be re-added with an empty data directory.

Contents

The data directory has two sub-directories:

  • wal: write ahead log files
  • snap: log snapshots

Recovering from backup in etcd v2

Don't panic

As long as you have a backup, you can get etcd back running. Start by shutting down etcd (preferably on every node because they will all need to be added manually). First, don't use your backup directly for anything. Just keep it safely out of harms way in case you need to start over.

Take a look at the values in /lib/system/systemd/etcd.service. Most of it is the environment variable version of command line flags to etcd. You'll notice that it runs as the etcd user, not root. This will all be useful.

  1. Disable puppet agent.
  2. If etcd is running, stop the daemon with `systemctl stop etcd` and make sure it died.
  3. Initially, copy your backup to whatever dir you like (/tmp locations work). You can even copy it over the spot that the running daemon will use if you are confident that your backup is good, but your current data is not.
  4. As root, if you need the permissions where your backup is, or as etcd (if the user can write to where you have your copy of the backup), first export the following values from the etcd.service file above (in this example below, we are restoring tools-k8s etcd cluster, not flannel or toolsbeta, starting with tools-k8s-etcd-01):
       # export ETCD_NAME=tools-k8s-etcd-01
       # export ETCD_CERT_FILE=/var/lib/etcd/ssl/certs/cert.pem
       # export ETCD_KEY_FILE=/var/lib/etcd/ssl/private_keys/server.key
       # export ETCD_LISTEN_CLIENT_URLS=https://tools-k8s-etcd-01.tools.eqiad1.wikimedia.cloud:2379
    

    Most of the values will not work right on this next step because it overwrites the peer url anyway.
  5. In the same shell, start etcd by hand, not using systemd like so:
    # /usr/bin/etcd --force-new-cluster --data-dir=/path/to/copy/of/backup/just/above/member
    
  6. If etcd is working well enough that way, shut it down
    # sudo pkill etcd
    
  7. Take that copy of the backup and copy it into the location where it should be when things are fully running. Then run
    # chown -R etcd:etcd /var/lib/etcd/tools-k8s/
    

    Presuming that's where you should have the data.
  8. Start etcd with `systemctl start etcd`. It'll show some log errors because of the nodes that aren't running in ETCD_INITIAL_CLUSTER. Changing that before starting will elminate those, but really, they don't matter.
  9. From another node (because localhost is blocked at the firewall) get the member ID and then use it to change the peer listen url to something useful (it's localhost right now):
    $ etcdctl --timeout "30s" -C https://tools-k8s-etcd-01.tools.eqiad1.wikimedia.cloud:2379 member list
    

    $ etcdctl --timeout "30s" -C https://tools-k8s-etcd-01.tools.eqiad1.wikimedia.cloud:2379 member update 6d1cfa48660001 http://tools-k8s-etcd-01.tools.eqiad1.wikimedia.cloud:2380
    

    where 6d1cfa48660001 is the member id from `member list` and `http://tools-k8s-etcd-01.tools.eqiad1.wikimedia.cloud:2380` is the correct peer URL from what you initially found in `/lib/system/systemd/etcd.service`.
  10. Add the other nodes as in #Adding_a_Member
  11. Enable puppet on all nodes.