dbctl

From Wikitech
Jump to navigation Jump to search

dbctl is a tool based on conftool to store Mediawiki's database configuration in etcd. Its code lives in conftool/extensions/dbconfig under operations/software/conftool.

In production, the only hosts with dbctl installed are the cumin cluster management hosts (e.g. cumin1001).

Background

Prior to dbctl, database loadbalancer configuration was kept in static PHP files in the operations/mediawiki-config repo -- for example wmf-config/db-eqiad.php. This meant that routine maintenance operations would often require several configuration deploys, which is all of time-consuming, tedious, error-prone, and would sometimes block/be blocked by others doing 'real' deploys.

Schema

The JSON output is intended to correspond to elements in the $wgLBFactoryConf configuration.

The contents of the etcd key /v2/keys/conftool/v1/mediawiki-config/$DATACENTER/dbconfig will return a dict with one element with key val and with a string value of JSON. Sample output (formatted manually for human readability):

"val":                                                                                                                                                                 
         {"groupLoadsBySection":                                                                                                                                               
            {"s4":                                                                                                                                                              
               {"contributions":                                                                                                                                                 
                  {"db1097:3314": 1, "db1103:3314": 1},
                "api":                                                                                                                                                           
                  {"db1138": 1, "db1084": 3},  
...
         },
          "sectionLoads":                                                                                                                                                             
            {"s5": [                                                                                                                                                              
               {"db1070": 0},                                                                                                                                                     
               {"db1110": 500, "db1113:3315": 1, "db1130": 500, "db1096:3315": 100, "db1082": 200, "db1097:3315": 100, "db1100": 100}                                             
              ],                                                                                                                                                                  
             "s4": [
...
             ]
            },
          "readOnlyBySection":                                                                                                                                                        
            {}
          }
       }",

Similar output can be viewed using dbctl config get or at https://noc.wikimedia.org/dbconfig/eqiad.json and https://noc.wikimedia.org/dbconfig/codfw.json.

Items in boldface are names of array elements in $wgLBFactoryConf:

groupLoadsBySection contains a dictionary of 'sections' (groups of db servers that serve a specific set of wikis). For each section, the following information is provided in the value:

  • name of the group ('api' for db servers that back MediaWiki api requests, 'vslow' for dbservers that handle extremely slow queries, and so on)
  • dict of servers and relative weights for traffic within the group

sectionLoads contains a dictionary of 'sections' where each section's value is an array with exactly two elements:

  • a dictionary with exactly one entry: the primary server, typically with 0 weight, which should only receive write traffic (and lag will be ignored by MW).
  • a dict of db servers, with weights determining how read traffic should be apportioned for requests not to a specific db server group

The schema of sectionLoads is a compromise between Mediawiki's (ab)use of PHP conventions (where associative arrays are actually ordered, and the first value given is assumed to be the master), and JSON dictionaries (which are defined as being unordered).

readOnlyBySection contains a dictionary of sections which have been set to read-only, with an explanation string given as the value. This section controls which sections are set to be read-only by MW, preventing any writes from arriving to the master. When this is enabled all edits on the affected wikis will display a banner announcing that that edition cannot happen as the wikis is set to read-only.

Usage

Some examples of typical usage are below. See also conftool/extensions/dbconfig/README.md

In production, the only hosts with dbctl installed are the cumin cluster management hosts (e.g. cumin1001). These commands must be run from those hosts.

Keep in mind that the option --batch exists for committing from scripts or other tools. That allows to skip the confirmation step

dbctl config commit --batch -m "Your message"


Completely depool a host:

dbctl instance db1000 depool
dbctl config commit -m "Depool db1000"

Depool a host from a section and a group (api, recentchanges, vslow, dump..)

dbctl instance db1000 depool --section s8 --group api
dbctl config commit -m "Your commit message XXX"

Fully repool a host:

dbctl instance db1000 pool
dbctl config commit -m "Your commit message XXX"

Slowly repool a host to warm it up (10% of its usual weight):

dbctl instance db1000 pool -p 10
dbctl config commit -m "Your commit message XXX"

Fully repool a host after it is already warmed up:

dbctl instance db1000 -p 100
dbctl config commit -m "Your commit message XXX"

Repool a host in a given group (api, recentchanges, vslow, dump...)

dbctl instance db1000 pool --section s8 --group api
dbctl config commit -m "Your commit message XXX"

Add a new host (ie: a new provisioned host) to a section

Before doing this, you still MUST add the IP of the host to db-eqiad.php and db-codfw.php, don't forget or you will cause a Mediawiki outage

Presently, you also need to write a small Puppet patch under conftool-data, although we hope to eliminate this soon.

dbctl --scope eqiad instance db1000 edit
  • Your $EDITOR will open. Fill out all the data (template provided)
dbctl config commit -m "Your commit message XXX"

Changing weights for a host

dbctl instance db1100 set-weight 500 --section s8
dbctl config commit -m "Your commit message XXX"

Changing weights for a host in a group

dbctl instance db1000 set-weight 3 --section s8 --group api
dbctl config commit -m "Your commit message XXX"

Setting a section on read only (ie: master failover)

dbctl --scope eqiad section s8 ro "Maintenance till 05:30AM UTC"
dbctl config commit -m "Your commit message XXX"

Setting a section on read-write (ie: master failover done and we skip the restore command)

dbctl --scope eqiad section s8 rw 
dbctl config commit -m "Your commit message XXX"

Setting a host as new master and also depool the previous master (which is what we normally do when we failover a master)

dbctl --scope eqiad section s8 set-master db1100
dbctl instance db01 depool 
dbctl config commit -m "Your commit message XXX"

Checking for any depooled hosts

dbctl instance all get  | jq 'select(..|.pooled? == false)'

Checking for depooled hosts in a given section

dbctl instance all get | jq 'select(.. | .sections? | has("s2")) | select(.. |  .pooled? == false)'

Checking all the instances associated with a given section

dbctl instance all get | jq 'select(.. | .sections? | has("s1"))'

Check live config

dbctl config get | jq '.eqiad|..|objects|.s1//empty'

Monitoring

Uncommitted dbctl diffs

This alert is similar to the puppet repo's unmerged changes alert, and indicates that changes have been made to the underlying instance or section objects in dbctl, but those changes have not yet been committed to the live config as read by Mediawiki using dbctl config commit.

dbctl config diff should show you what the deltas are. Inspecting who has recently logged into the cluster-management hosts (cumin*) may give you ideas as to who might have made the changes.

Emergency revert to static configs

It's hard to imagine a scenario where we'd need specifically this, as in the event of an etcd outage this wouldn't be alone sufficient to restore the site to working operation, and in the event of dbctl data being corrupted this is no help as it still depends upon dbctl data, but anyway, here's a procedure:

Maintenance tasks

Building and deploying a new release

TODO

Schema upgrades

In the event you add a new field to the schema (example change) you will probably see a lot of logging output like this:

WARNING:conftool:Setting note to the default value 
WARNING:conftool:Setting note to the default value 
WARNING:conftool:Setting note to the default value

First check for any diffs vs production, you don't want to do this while someone else is actively making modifications!

dbctl config diff

Then simply (ab)use the edit subcommand in a shell one-liner to do a no-op read-'modify'-update on all the relevant objects:

for INST in $(dbctl instance all get | jq 'keys[0]' -r) ; do EDITOR=/bin/true dbctl instance $INST edit; done

Check for diffs again to make sure you didn't inadvertently stomp on anyone else's changes, or that something went wrong:

dbctl config diff