Portal:Data Services/Admin/Runbooks/Depool wikireplicas
Wikireplicas need work sometimes, fail, etc. This details how to depool each stage of the path to a wikireplica database.
Overview
In order to reach a wikireplica server from Wikimedia Cloud Services, you cross three proxy layers. That provides several opportunities for depooling and reshuffling parts of the system for maintenance and down time. Operations take place at the proxy layers or, potentially, in DNS.
VM Proxy
Hostnames: clouddb-wikireplicas-proxy-*.clouddb-services.eqiad1.wikimedia.cloud
The VM proxies hold 8 IPs each so that they can route requests coming into port 3306 (mysql) according to DNS. The mysql wire protocol does not allow you to identify a hostname or database before authenticating to the server, partly because the server talks first. You have to hit the right server for handshake. So that users can select their database using the set of DNS CNAMES back to one of the 8 section (mariadb instance) names of s1 through s8, you need 8 IPs on a proxy. This layer of haproxy interprets an IP into a port on the next layer (since each section corresponds to a non-standard port number).
Each IP is mapped to a DNS CNAME:
# clouddb-wikireplicas-proxy-1 s[1-8].web.db.svc.wikimedia.cloud
# clouddb-wikireplicas-proxy-2 s[1-8].analytics.db.svc.wikimedia.cloud
At this time, you can depool a VM proxy quite simply without changing anything in OpenStack by editing modules/openstack/files/util/wikireplica_dns.yaml so that analytics
and web
IPs for the various sections all match the VM proxy you are not depooling.
Next, as root on a cloudcontrol host:
run-puppet-agent
to get your puppet changewmcs-wikireplica-dns
This would direct all traffic at the new proxy (just as soon as DNS caches expire and all existing connections are done). This could take a fair while. If you use this method, waiting an hour is not a bad idea once the change is in place and you have run wmcs-wikireplica-dns. After you've waited a while, you can be sure that connections will disconnect if you stop haproxy on the depooled node.
Repool by reverting your puppet patch and doing the same steps above!
LVS servers
Hostnames: lvs*.eqiad.wmnet
WMCS shouldn't be concerned with the LVS servers themselves. The only thing to know here is that LVS may trigger an alert when an upstream dbproxy
server goes down. An understanding of our LVS setup is therefore required. The short summary is that LVS servers decide how to route packets received from the VM proxies (see previous section) to the Hardware proxies (see next section). The VM proxies send packets to either the wikireplicas-a.wikimedia.org
IP or to the wikireplicas-b.wikimedia.org
IP. To modify how those IPs are routed, you can use the confctl
CLI:
root@cumin1001:~# confctl select "name=dbproxy1018.eqiad.wmnet" get {"dbproxy1018.eqiad.wmnet": {"weight": 0, "pooled": "yes"}, "tags": "dc=eqiad,cluster=wikireplicas-a,service=wikireplicas-a"} {"dbproxy1018.eqiad.wmnet": {"weight": 0, "pooled": "no"}, "tags": "dc=eqiad,cluster=wikireplicas-b,service=wikireplicas-b"}
Hardware proxies
Hostnames: dbproxy*.eqiad.wmnet
These servers (currently dbproxy1018.eqiad.wmnet and dbproxy1019.eqiad.wmnet) are the last layer before you actually hit a server. They each have 2 LVS public IPs, if you ssh to one and run ip a show dev lo
you'll see those IPs. LVS decides which of the 2 hosts receives traffic for a given IP (see the previous section). Those IPs are also mapped to 2 DNS: wikireplicas-a.wikimedia.org
and wikireplicas-b.wikimedia.org
.
You'll also see that that those IPs are repeated 8 times under profile::wmcs::db::wikireplicas::section_backends
in the Prefix Puppet Hiera Config for clouddb-wikireplicas-proxy-1
and clouddb-wikireplicas-proxy-2
(make sure you are in the clouddb-services
Horizon project to see those). These values are applied to the HAProxy config in the VM proxies to route traffic to one dbproxy or the other in normal operation. To drop one of the dbproxy from operation, simply find the Hiera Config that has the public IP that is mapped to the dbproxy that you want to depool, and swap out that IP for the one that is not being depooled. To find the current mapping, you can use confctl
as explained in the previous section.
When the Hiera Config is saved, HAProxy will reload on the next puppet run. You can run puppet yourself just to be sure on the clouddb-wikireplicas-proxy-? server that you changed the Hiera config for. This will direct all new connections to the other hardware proxy. This will not cut off existing connections unless you restart haproxy, which is likely fine for most cases, since connections should not be longer than an hour or so even on the analytics proxy. Connections that are still active after one hour can be forcefully terminated.
Before restarting or shutting down the depooled dbproxy, remember to modify the pooled status with cnftool
otherwise LVS will trigger an alert.
Wikireplica database servers
Finally Wikireplica database servers are generally depooled using puppet host hiera that is applied in operations/puppet to the dbproxy hardware server in question. Remember that most database servers include 2 mariadb instances (sections) for each server. The configuration of the hardware proxy layer is what allows depooling of database servers (and other manipulation of traffic).
The basic, default configuration is pulled from puppetdb based on puppetized settings of the database servers themselves. Each analytics database instance is the standby for the web service of the same section number in the default configs, and each web database instance is the standby for the analytics instance of the same section number, using some reduce and merge trickery in puppet. Depooling is accomplished by overriding what the reduce functions produce for the sections in the resulting hash using the profile::mariadb::proxy::multiinstance_replicas::section_overrides
key in host hiera, which will be merged at the end of the process.
Instructions and examples are provided in the comments for the host hiera yaml. For instance, see dbproxy1018.yaml or dbproxy1019.yaml in operations puppet. The big difference here is that you need to manually reload haproxy yourself once puppet runs.
At any time, you can see the effective, in-memory configuration and status of the haproxy server by running the following as root:
root@dbproxy1018:~# echo "show stat" | socat /run/haproxy/haproxy.sock stdio
Coordinate with the Data Persistence team to make sure there are no on-going operations you could interfere with, of course.
An example of doing this is provided in the comments of each proxy server's host hiera file to make it clearer and for quicker depooling. Once the appropriate hiera file is changed, you will need to do the same reloading process as for the legacy replicas for this to take effect.
As a basic example, if you wanted to depool clouddb1015.eqiad.wmnet specifically and it is ok to leave it as the secondary on dbproxy1018 (which it normally would be), then you'd add (or decomment) the following to hieradata/hosts/dbproxy1019.yaml and you'd need to reload haproxy on the given proxy, and this should do it by simply overwriting the s4 and s6 keys:
profile::mariadb::proxy::multiinstance_replicas::section_overrides:
s4:
clouddb1019.eqiad.wmnet:
ipaddress: 10.64.48.9
s6:
clouddb1019.eqiad.wmnet:
ipaddress: 10.64.48.9
After you do merge your puppet patch, you log into the target dbproxy101x server and run:
sudo -i puppet agent -t
sudo systemctl reload haproxy
Again you can make sure everything did what you expected with the above socat command as root.
Sidenote on the overrides hiera
Besides ipaddress
, other valid keys are weight
, which only really matters if you have more than one host under the section key, the boolean standby
, which adds the host entry to haproxy as a standby under that key and depooled
, which expressly removes the host entry from the resulting file. The last one is unlikely to be necessary since adding an uncommented section key (eg. s2:) will overwrite the automatically generated one anyway. It could be useful one day, possibly if we start doing deep merges or something for the overrides. At this time, it is a shallow merge.
Support contacts
If you are following this, you are probably already a part of the WMCS or Data Persistence team. Perhaps you can ask the team you are not on if you need more help?