Conftool/Load balanced services

From Wikitech

All of our production LVS pools use Conftool with Etcd as data source.

States in conftool and their meaning

Each LVS-DR backend server is represented by a node object in conftool. This object has two attributes: pooled and weight. While the weight is just the weight value that will be used in the LVS pool on the load-balancer, pooled can assume three values, that correspond to different states: yes, no and inactive.

Let's get into a bit more detail about all of those:

  • yes means the server is active and supposed to be ready to serve requests. It will be included in the LVS pool, if the health checks performed by PyBal are passing
  • no means the server is momentarily removed from serving live traffic, may it be for debugging purposes or for other maintenance. The server will not be pooled in the LVS pool, but it will still be considered active otherwise. It means that Pybal checks will still be performed against the server, that it will count towards PyBal's calculation of the depool threshold, that it will still be included in scap's dsh groups, and so on. Only thing that changes with respect to yes is that pybal will not send traffic to it if we're not above the depool threshold.
  • inactive means the server is out of rotation and supposed to be in that situation for an extended period of time. It won't be included in PyBal's configuration or anywhere else - be it the scap dsh groups or any other place where this information is used.

As explained above, the presence of safety measures (like the depool threshold) in PyBal mean that setting "pooled=no" does not guarantee the server is not serving live traffic anymore. Any automation that wants to perform actions on the clusters correctly should check pybal's own HTTP status API to ensure the server is effectively depooled.

Helper scripts

There are several scripts you can use to operate on services that contribute to serving traffic to an LVS pool. They can be divided into three categories:

  1. Simple confctl shortcuts. Those include the pool, depool, drain and decommission. These just act on etcd and don't verify that the server is actually depooled in pybal. Specifically:
    1. pool sets pooled=yes for all pools the server is included in, or just one if a command-line argument is provided.
    2. depool is equal to pool, but sets pooled=no
    3. decommission sets pooled=inactive
    4. drain instead sets weight=0
  2. Restart scripts. These are scripts with names like restart-<systemd service name> where the systemd service is any of the services used to respond to requests from that pool. So for example on a cirrussearch node, nginx is involved in serving all of the configured pools, while the services like elasticsearch_6@production-search-eqiad are only involved in serving a single lvs pool. Each of these scripts will do the following things:
    1. Depool in confctl the server from all pools that are affected by the service
    2. Verify with all the load balancers that the server is effectively depooled by querying the pybal api
    3. After retrying a certain number of times, if the server is still not depooled, fail
    4. Restart the service using systemctl
    5. Pool the previously depooled services. IMPORTANT: If the services were already depooled, they don't get repooled.
    6. Verify again with all the load balancers that the server is effectively repooled
  3. Service-specific pool/depool scripts. These are scripts with names like {pool,depool}-<systemd service name> where the systemd service is any of the services used to respond to requests from that pool. They verify that the server has been actually depooled/pooled in pybal, pretty much as the restart scripts do. Both scripts will only act if the server is in the correct state (pooled=yes for depooling, pooled=no for pooling) thus they won't act on servers set as pooled=inactive . This is very convenient for e.g. deployment pre-release and post-release hooks as it won't repool servers that are already inactive.

In most cases, the simple shortcuts should be avoided, and the other scripts should be used instead. Disadvantages of using those simple scripts include no feedback loop with Pybal and no respect for previous state of the server.

In cookbooks

When dealing with services in a load-balanced cluster, the usual pattern of action is:

  • Act on a percentage of all servers in the pool at the same time
  • Depool all services that were pooled (by setting pooled=no)
  • Perform the potentially disruptive actions we want
  • Repool all services we previously depooled

This logic is provided in Spicerack by the LBRemoteCluster class, which is usually instatiated via the spicerack.remote.query_confctl helper.

Below an example of use of such a construct:

confctl = spicerack.confctl('node')
remote = spicerack.remote()
jobrunners = remote.query_confctl(confctl, cluster='jobrunner')
# will run the commands on all the jobrunners, in batches of 3, accepting at most 2 batches that have
# failures before stopping execution.
jobrunners.run('systemctl show nginx.service -p MainPID', 'systemctl restart nginx.service', 
    svc_to_depool=['nginx'], batch_size=3, max_failed_batches=2)