Conftool

From Wikitech
Jump to: navigation, search

Conftool is a set of tools we use to sync and manage the dynamic state configuration for a few services (as of June 2015, only varnish backend lists and the pybal pools). This configuration is stored in the distributed key/value store: Etcd.

Overview

Conftool just gets information input in a series of configuration files, which are in the conftool-data/ directory in the Puppet repository. These files represent a static view of the configuration - so some information about services we manage, and then which services are installed on which hosts. There is another part of the equation, which is the dynamic state of such configuration (such as, the weight of the server in its pool, and the information about either having the server pooled or not), which is left untouched by the sync (apart from setting default values in newly added hosts).

The config files

Relative to the conftool root, configuration files are organized as follows:

  • the services directory contains a single file called data.yaml and has all the information on services, in the following form:
cluster_name:
  service_name:
    port: 1234
    default_values:
      pooled: inactive
      weight: 10
    datacenters:
      - eqiad
  another_service:
  ...

here, cluster_name is supposed to be almost always a 1:1 correspondence to the 'cluster' we define in puppet.

  • the nodes directory, where we have instead one file per datacenter, so nodes/eqiad.yaml. The format is again quite simple:
cluster_name:
  node_name.eqiad.wmnet:
    - service_name
    - another_service
  another_node.eqiad.wmnet:
    - service_name
...

The tools

Currently, we have two tools, both installed on the puppetmaster:

  • conftool-sync which is used to sync what we write in the files described above to the distributed key/value-cluster (as of June 2015, it's [Etcd], but this may well change in the future). conftool-sync will not be called by you directly, in most cases, you will just call conftool-merge (in a near future, it will be directly invoked by our puppet-merge utlity on the puppetmaster.
  • confctl is the tool to interact with the key/value store and set dynamic values; for the full details of how to use it please see the README. a typical invocation could be:
confctl select dc=eqiad,cluster=cache_text,service=varnish-be,name=cp1052.eqiad.wmnet get

{"cp1052": {"pooled": "no", "weight": 0}}

where the tags argument is a comma-separated list of data that specifies the service you want to query, so for the varnish backend service of the cache_text cluster in the eqiad datacenter will look like shown above.

The required tag list of course changes, but conftool will complain if you don't specify those correctly. Of course you can work on any object, you just need to specify the object-type parameter. So for example:

confctl --object-type service select cluster=cache_text,name=varnish-be get

will work as well.

In puppet

Conftool is installed by including the conftool class into your node manifest. It won't install the conftool-data directory, though, which is part of the puppet git repository. So it's pretty natural for the puppetmaster (puppetmaster1001) to be the standard machine where you should run conftool.

Operating

Add a service

If you need to add a service to a cluster, just edit the relevant yaml file under conftool-data/services, adding a service entry, and then run conftool-sync.

So for now you typically:

  • Create a puppet change adding the service stanza
  • On puppetmaster1001, you run puppet-merge
  • Again on puppetmaster1001, you run conftool-merge without arguments (this is a wrapper script that "does the right thing")

Add a server node to a service

confctl select 'service=(varnish-fe|nginx),name=<fqdn>' set/pooled=yes

If you need to add a server node to a pool, find the corresponding cluster in conftool-data/nodes/, see if the node stanza is present. If it is, then just add the service to the list of services; if not, add the node with its fqdn, as a key to the cluster, and add a list containing the service as a value.

After you have done that, you will need to merge the change in puppet and follow the steps outlined before for adding a service. Typically, though, new nodes will NOT be pooled, so if you want to pool your service you will need to modify the state of the node as shown below.

Modify the state of a server in a pool

Let's say we want to depool the server mw1018.eqiad.wmnet: what we'll do is what follows:

  • The server is in the eqiad datacenter, is part of the appserver cluster in puppet, and the service we want to change is apache2. We need all this information as we'll see next.
  • Run, from any host where conftool is installed:
confctl select dc=eqiad,service=apache2,name=mw1018.eqiad.wmnet set/pooled=no
  • Verify that it worked with
confctl select name=mw1018.eqiad.wmnet get

The syntax for the set action is: set/key1=value1:key2=value2. A small note on the pooled value meaning:

  • yes means the server is pooled
  • no means the server is not pooled but (only in pybal) present in the config
  • inactive means the server is not in the config we write at all

Pooling/depooling a server from all the related services

When a server is in maintenance mode or needs to be depooled/repooled in all of its services, you can use the --find argument instead of the tags. In this way, confctl will act on every service that is present on the server you indicate:

confctl select name=foo.example.com set/pooled=no

Be careful on cache servers: this will not only depool the server from the load balancers, but also as a backend varnish! If you just want to depool the server from pybal, the best solution is to

confctl select 'cluster=CLUSTER,service=(varnish-fe|nginx),name=foo.example.com' set/pooled=yes

where CLUSTER should get the adequate value (note that while it's not fundamental to specify additional tags if you set the name, it will slightly speed up execution).

Decommission a server

Decommissioning a server is as simple as:

  • Depool it from all services (as seen above)
  • Remove its stanza from conftool-data, then sync the data exactly in the way you did for adding a node.

Show pool status

Per-pool status is available at all times at http://config-master.wikimedia.org/conftool/DATACENTER/POOL or available via confctl like so:

 # confctl --tags dc=DATACENTER,cluster=CLUSTER,service=POOL --action get all | jq .
 {
   "restbase1011.eqiad.wmnet": {
     "weight": 10,
     "pooled": "yes"
 },
 ...

Server changing IP address

At the moment, PyBal does not redo DNS resolution. In the case where a server changes IP address, for example when moved to a different row, it is necessary to make PyBal completely forget about this server. This can be done by setting the server to set/pooled=inactive:

 # confctl select name=foo.example.net set/pooled=inactive
 # sleep 60
 # confctl select name=foo.example.net set/pooled=yes