dbctl
dbctl is a tool based on conftool to store Mediawiki's database loadbalancer configuration in etcd. Its code lives in here under conftool repository on Gitlab.
In production, the only hosts with dbctl installed are the cumin cluster management hosts (cumin1002.eqiad.wmnet, cumin2002.codfw.wmnet
). Running on either cumin host is fine, but you will get much better performance running on the eqiad host as it is co-located with the RW etcd cluster.
Background
Prior to dbctl, database loadbalancer configuration was kept in static PHP files in the operations/mediawiki-config repo -- for example wmf-config/db-eqiad.php. This meant that routine maintenance operations would often require several configuration deploys, which is all of time-consuming, tedious, error-prone, and would sometimes block/be blocked by others doing 'real' deploys.
Schema
The JSON output is intended to correspond to elements in the $wgLBFactoryConf
configuration.
The contents of the etcd key /v2/keys/conftool/v1/mediawiki-config/$DATACENTER/dbconfig
will return a dict with one element with key val
and with a string value of JSON. Sample output (formatted manually for human readability):
"val": {"groupLoadsBySection": {"s4": {"contributions": {"db1097:3314": 1, "db1103:3314": 1}, "api": {"db1138": 1, "db1084": 3}, ... }, "sectionLoads": {"s5": [ {"db1070": 0}, {"db1110": 500, "db1113:3315": 1, "db1130": 500, "db1096:3315": 100, "db1082": 200, "db1097:3315": 100, "db1100": 100} ], "s4": [ ... ] }, "externalLoads": {"es1": [ {"es1012": 1}, {"es1016": 1, "es1018": 1} ], ...}, "readOnlyBySection": {} } }",
Similar output can be viewed using dbctl config get
or at https://noc.wikimedia.org/dbconfig/eqiad.json and https://noc.wikimedia.org/dbconfig/codfw.json.
Items in boldface are names of array elements in $wgLBFactoryConf
:
groupLoadsBySection contains a dictionary of 'sections' (groups of db servers that serve a specific set of wikis). For each section, the following information is provided in the value:
- name of the group ('api' for db servers that back MediaWiki api requests, 'vslow' for dbservers that handle extremely slow queries, and so on)
- dict of servers and relative weights for traffic within the group
sectionLoads contains a dictionary of 'sections' where each section's value is an array with exactly two elements:
- a dictionary with exactly one entry: the primary server, typically with 0 weight, which should only receive write traffic (and lag will be ignored by MW).
- a dict of db servers, with weights determining how read traffic should be apportioned for requests not to a specific db server group
The schema of sectionLoads is a compromise between Mediawiki's (ab)use of PHP conventions (where associative arrays are actually ordered, and the first value given is assumed to be the master), and JSON dictionaries (which are defined as being unordered).
readOnlyBySection contains a dictionary of sections which have been set to read-only, with an explanation string given as the value. This section controls which sections are set to be read-only by MW, preventing any writes from arriving to the master. When this is enabled all edits on the affected wikis will display a banner announcing that that edition cannot happen as the wikis is set to read-only.
Usage
Some examples of typical usage are below. See also conftool/extensions/dbconfig/README.md
In production, the only hosts with dbctl installed are the cumin cluster management hosts (cumin1002.eqiad.wmnet, cumin2002.codfw.wmnet
). These commands must be run from those hosts.
Keep in mind that the option --batch
exists for committing from scripts or other tools. That allows to skip the confirmation step
dbctl config commit --batch -m "Your message"
Completely depool a host
dbctl instance db1000 depool dbctl config commit -m "Depool db1000"
Depool a host from a section and a group (api, recentchanges, vslow, dump..)
dbctl instance db1000 depool --section s8 --group api dbctl config commit -m "Your commit message XXX"
Automatically slowly repool a host
We have a script called repool that would automatically pool a host every 15 minutes with a given percentage of its original weight, in this example (10%, 25%, 50%, 75% and 100%)
./repool $hostname 'Repooling message' 10 25 50 75 100
Fully repool a host
dbctl instance db1000 pool dbctl config commit -m "Your commit message XXX"
Slowly repool a host to warm it up (10% of its usual weight)
dbctl instance db1000 pool -p 10 dbctl config commit -m "Your commit message XXX"
Fully repool a host after it is already warmed up
dbctl instance db1000 -p 100 dbctl config commit -m "Your commit message XXX"
Repool a host in a given group (api, recentchanges, vslow, dump...)
dbctl instance db1000 pool --section s8 --group api dbctl config commit -m "Your commit message XXX"
Add a new host (ie: a new provisioned host) to a section
Presently, you also need to write a small Puppet patch under conftool-data
, simply adding the name of the host to the dbconfig-instances file -- although we hope to eliminate this soon. (Do this before you do the steps below!)
dbctl --scope eqiad instance db1000 edit
- FIll out the data using the auto-generated template that will open in your $EDITOR
dbctl config commit -m "Your commit message XXX"
Removing / decommissioning a host
- Remove the host from being a part of any sections with
dbctl instance FOO edit
- Write a Puppet patch against
conftool-data
to remove the host from the YAML files there. puppet-merge will run the conftool syncer to remove data about the host from etcd. - Commit the config to also remove the host->IP mapping from the
hostsByName
section of the generated config:
dbctl config commit -m "Decomming FOO"
Changing weights for a host
dbctl instance db1100 set-weight 500 --section s8 dbctl config commit -m "Your commit message XXX"
Changing weights for a host in a group
dbctl instance db1000 set-weight 3 --section s8 --group api dbctl config commit -m "Your commit message XXX"
Setting a section on read only (ie: master failover)
dbctl --scope eqiad section s8 ro "Maintenance till 05:30AM UTC" dbctl config commit -m "Your commit message XXX"
Setting a section on read-write (ie: master failover done and we skip the restore command)
dbctl --scope eqiad section s8 rw dbctl config commit -m "Your commit message XXX"
Setting a host as new master and also depool the previous master (which is what we normally do when we failover a master)
dbctl --scope eqiad section s8 set-master db1100 dbctl instance db01 depool dbctl config commit -m "Your commit message XXX"
Checking for any depooled hosts
dbctl instance all get | jq 'select(..|.pooled? == false)'
Checking for depooled hosts in a given section
dbctl instance all get | jq 'select(.. | .sections? | has("s2")) | select(.. | .pooled? == false)'
Checking all the instances associated with a given section
dbctl instance all get | jq 'select(.. | .sections? | has("s1"))'
Find candidate masters for a given section
dbctl -s eqiad instance all get | jq 'select(.[].sections?.s4?.candidate_master?)'
Check live config
dbctl config get | jq '.eqiad|..|objects|.s1//empty'
Add new core section
Let's say that a new core section called s15 needs to be added to MW. This is an example patch: https://gerrit.wikimedia.org/r/c/operations/puppet/+/649890
On the puppet repo, add the section to:
conftool-data/dbconfig-section/sections.yaml
At the same time, or later, add the hosts that will serve that section to
conftool-data/dbconfig-instance/instances.yaml
Push and merge that puppet change. Once that is done you might need to populate the data for that new section with:
dbctl -s eqiad section s1 edit
This is the example of a current section
# dbctl -s eqiad section s1 get { "s1": { "flavor": "regular", "master": "db1083", "min_replicas": 6, "readonly": false, "ro_reason": "Maintenance on enwiki T251982" }, "tags": "datacenter=eqiad" }
The flavor
field differentiates normal "core" wiki metadata sections (s1, s2, s15, etc) from "external" data sections (x1, es1, pc1, etc). Set it to regular
for the former, and external
for the latter (if you are adding a new external
section that is not a parsercache section, please see the section about externalload for the additional steps)
It also needs to be added to the following puppet files:
modules/profile/files/conftool/json-schema/dbconfig/instance.schema
modules/profile/files/conftool/json-schema/mediawiki-config/dbconfig.schema
conftool-data/dbconfig-section/sections.yaml
See gerrit examples: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1025699 https://gerrit.wikimedia.org/r/c/operations/puppet/+/1025603
Once you are ready to populate the hosts, make sure to edit and add the correct values with:
dbctl instance dbXXXX edit
Once you've got all the changes ready, push them with:
dbctl config commit -m "Populate new hosts for new section XX TXXXXXX"
Add new externalload section
If you need to add a external
section like x1, x2, or esX (i.e., not a parsercache section), apart from the above steps you'd need to edit a wmf-config/etcd.php
on the mediawiki-config repo to add that new section to the mapping array, see this example: https://gerrit.wikimedia.org/r/658218
Monitoring
I want to see changes pushed with dbctl
dbctl emits !logs into SAL (toolforge:sal/production?p=0&q=dbctl&d=) which include diffs from one configuration version to the next.
Uncommitted dbctl diffs
This alert is similar to the puppet repo's unmerged changes alert, and indicates that changes have been made to the underlying instance or section objects in dbctl, but those changes have not yet been committed to the live config as read by Mediawiki using dbctl config commit
.
dbctl config diff
should show you what the deltas are. Inspecting who has recently logged into the cluster-management hosts (cumin1002.eqiad.wmnet, cumin2002.codfw.wmnet
) may give you ideas as to who might have made the changes.
Runbook
Emergency revert to static configs
It's hard to imagine a scenario where we'd need specifically this, as in the event of an etcd outage this wouldn't be alone sufficient to restore the site to working operation, and in the event of dbctl data being corrupted this is no help as it still depends upon dbctl data, but anyway, here's a procedure:
- Use export.php to generate static PHP configurations from dbctl data.
- In
operations/mediawiki-config
, editwmf-config/etcd.php
, and insidefunction wmfEtcdApplyDBConfig
modifyif ( $wmgRealm === 'production' )
to be insteadif ( false )
. - Place the configurations generated above in their traditional spots in
db-eqiad.php
anddb-codfw.php
. - Then, do a configuration deployment as usual.
Building and deploying a new release
TODO
Schema upgrades
In the event you add a new field to the schema (example change) you will probably see a lot of logging output like this:
WARNING:conftool:Setting note to the default value WARNING:conftool:Setting note to the default value WARNING:conftool:Setting note to the default value
First check for any diffs vs production, you don't want to do this while someone else is actively making modifications!
dbctl config diff
Then simply (ab)use the edit
subcommand in a shell one-liner to do a no-op read-'modify'-update on all the relevant objects:
for INST in $(dbctl instance all get | jq 'keys[0]' -r) ; do EDITOR=/bin/true dbctl instance $INST edit; done
Performing this for sections is a bit more complicated as sections share names between datacenters:
for SCOPE in eqiad codfw; do for SECT in $(dbctl -s $SCOPE section all get | jq 'keys[0]' -r) ; do EDITOR=/bin/true dbctl -s $SCOPE section $SECT edit ; done ; done
Check for diffs again to make sure you didn't inadvertently stomp on anyone else's changes, or that something went wrong:
dbctl config diff