MariaDB/Rebooting a host
Appearance
< MariaDB
The procedure for rebooting and reimaging is exactly the same, the only difference is the cookbook that you are running in the #Run the reboot or reimage cookbook section below.
Clean shutdown
- First, get a list of instances on the host. The easiest way is to SSH into the host and check the MOTD, which will include something like this:See MariaDB/Multiinstance for more details.
DB section s1 (alias: mysql.s1) DB section s3 (alias: mysql.s3)
- Depool the host if necessary:
- if any of the instances are known to dbctl, then the host will need to be depooled if it's in an active DC, and repooled afterwards
sudo dbctl instance <instance> get # E.g. for single-instance host: # sudo dbctl instance db1123 get # E.g. for multi-instance host: # sudo dbctl instance db1102:3312 get # sudo dbctl instance db1102:3313 get # sudo dbctl instance db1102:3320 get
clouddb*
hosts also need to be depooled before being rebooted, see Portal:Data Services/Admin/Runbooks/Depool wikireplicas for more details.# To check the pooled status cumin1002:~$ sudo confctl select name=clouddbXXXX.eqiad.wmnet get # To depool a section cumin1002:~$ sudo confctl select name=clouddbXXXX.eqiad.wmnet,service=sY set/pooled=no
- if any of the instances are known to dbctl, then the host will need to be depooled if it's in an active DC, and repooled afterwards
- Downtime the host in icinga for 1h:If the host has any replicas, they will also need to be downtimed, to prevent replication alerts from firing.
sudo cookbook sre.hosts.downtime --hours 1 -r "Rebooting dbXXXX TXXXXXX" '<fqdn>'
- [optional] Flush dirty pages. InnoDB can take a while (hours) to write dirty pages out to disk on shutdown. If the instance must have a predictable downtime then make MariaDB begin flushing dirty pages well in advance of the shutdown process:Then poll:
set global innodb_max_dirty_pages_pct=0;
When the counters approach zero, a shutdown and/or restart will be fast. Of course, since MariaDB is still handling traffic in the meantime, if write load is high the counters may never drop :-) and you will have to depool the box in that case.show global status like '%dirty%';
- [optional] Disable buffer pool dump. On the default config, MariaDB will dump its buffer pool index to disk during the shutdown, and load it automatically on start, decreasing its warmup period. This is described in more detail in the MariaDB/buffer pool dump page. If you want to avoid this (e.g. because the current buffer pool is not fully loaded), connect to each mysql instance and run:This option will not persist, and revert to ON on next reboot.
mysql> SET GLOBAL innodb_buffer_pool_dump_at_shutdown = OFF;
- If the host is replicating from another host, stop replication manually.This can take some time. If you reboot the host without stopping replication first, there is a high chance that the operating system kills the mariadb process before it can do a clean shutdown. This can lead to data corruption, forcing you to reload all the data from another host (a multi-hour or multi-day task!)
# Single-instance sudo mysql -e "STOP SLAVE" # Multi-instance, per instance: sudo mysql.<section> root@<host>:<section>[(none)]> STOP SLAVE;
If the host is currently replicating a slow transaction, STOP SLAVE will hang until the slow transaction is completed. If you press Ctrl+C to retry later, be aware that MariaDB will still apply the STOP SLAVE command as soon as the slow transaction completes. - Stop mariadb instance(s) on the hostThis can also take up to several minutes, as MariaDB needs to empty the buffer pool. You should always wait for MariaDB to complete a clean shutdown, otherwise rebooting the host can lead to data corruption.
# single-instance: sudo systemctl stop mariadb # multi-instance, for each section: sudo systemctl stop mariadb@<section> # Avoid using a wildcard (mariadb@*), stop one section at a time # e.g.: # sudo systemctl stop mariadb@s2 # sudo systemctl stop mariadb@s3 # sudo systemctl stop mariadb@x1
- Unmount
/srv
and disable swapIf you don't umountsudo disable-puppet "server shutdown Txxxxxx" # prevents puppet from messing up with /srv sudo umount /srv sudo swapoff -a
/srv
manually, there is a risk that systemd does not wait for the umount to complete and that can lead to data corruption. - If you are rebooting and not reimaging the host, this is a good time to upgrade MariaDB to the latest minor version. It's usually safe to upgrade other packages too.
sudo apt full-upgrade
Run the reboot or reimage cookbook
If you are rebooting the host, you can simply use sudo reboot
, but it's preferable to use the sre.hosts.reboot-single
cookbook, which will log to SAL, check that the host reboots successfully and wait for a successful Puppet run:
cumin1002:~$ sudo cookbook sre.hosts.reboot-single -r "Reason for reboot" <fqdn-of-host>
If you reimaging the host, use the sre.hosts.reimage
cookbook:
cumin1002:~$ sudo cookbook sre.hosts.reimage --os bookworm -t Txxxxxx <short-name-of-host>
When reimaging, double check that the partman recipe for the host you are reimaging is preserving the data partitions. The mapping from hostname to partman recipe is in the file preseed.yaml.
After boot
The reboot and reimage cookbooks will stay in a loop until you manually restart MariaDB as described below. Don't wait for the cookbooks to complete, but follow the steps below while the cookbook is still running. When the cookbook finds a successful Puppet run, it should complete cleanly.
On most production hosts, the mariadb instance or instances won't restart automatically. This is intended behavior to prevent a crashed host to be pooled automatically with corrupt data or lag, before its health can be manually checked.
- [optional] If you are upgrading MariaDB to a new major version, or if you are doing any other kind of dangerous maintenance, it is better to avoid an automatic buffer pool load on start up. To do so, rename the file on each data directory from ib_buffer_pool to ib_buffer_pool.bak This will make the old buffer pool unusable, while allowing a dump to be produced the next time it shuts down for a normal restart.
- [optional] If you are upgrading MariaDB to a new major version, you want to make sure that MariaDB doesn't start replication automatically. On most db hosts this is the default, but if you want to be completely sure you can set the following environment variable:
systemctl set-environment MYSQLD_OPTS="--skip-slave-start"
- Start MariaDB by running:Where
# Single-instance: sudo systemctl start mariadb # Multi-instance, per instance: sudo systemctl start mariadb@<section>
<section>
is one of the sections that are present on that particular server (m1
,x1
, etc.). Don't worry, only configured sections on puppet will start, others will fail to start if tried.
- On most db hosts, replication is configured not to start automatically. Check as follows (if replication is running, it will return
IO thread running: Yes / SQL thread running: Yes
):# Single-instance sudo mysql -e "SHOW SLAVE STATUS" # Multi-instance, per instance: sudo mysql.<section> root@<host>:<section>[(none)]> SHOW SLAVE STATUS\G
- If replication is stopped and should be running, you can start it as follows:
# Single-instance sudo mysql -e "START SLAVE" # Multi-instance, per instance: sudo mysql.<section> root@<host>:<section>[(none)]> START SLAVE;
- Force a Puppet run. The automatic Puppet run triggered by the reboot/reimage cookbook is very likely to have failed because MariaDB was not running, so now that you've started MariaDB you should run Puppet manually and make sure it completes with no errors. This should also allow the reboot/reimage cookbook to complete successfully if it was stuck on a loop at "Unable to find a successful Puppet run".
sudo puppet agent --enable sudo run-puppet-agent
- Check that the prometheus mysql exporter is running, and start it manually if it isn't:
# Single-instance: sudo systemctl status prometheus-mysqld-exporter # Multi-instance, per instance: sudo systemctl status prometheus-mysqld-exporter@<section>
- [Only if you are rebooting a clouddb* wikireplica host] Check that the wmf-pt-kill service is running:
# Multi-instance, per instance: sudo systemctl status wmf-pt-kill@<section>
- [Only if you are rebooting a primary host] Check that the pt-heartbeat service is running, and start it manually if it isn't, otherwise lag alerts will be fired.Note: We should try not to reboot primary db instances for obvious reasons, and switch its active primary status beforehand, but that is sometimes done not by choice!
# Single-instance: sudo systemctl status pt-heartbeat-wikimedia # Multi-instance, per instance: sudo systemctl status pt-heartbeat-wikimedia@<section>
- Check that there are no failed systemd units:On some hosts you might find a failed
sudo systemctl list-units --failed
prometheus-mysqld-exporter.service
unit. This is something we should fix in Puppet, but in the meantime you can fix it with:sudo systemctl disable prometheus-mysqld-exporter.service sudo systemctl mask prometheus-mysqld-exporter.service sudo systemctl reset-failed
- [Only if the host is replicating from another host] Check that the replication lag is back to zero.
- You can finally repool the host if it was depooled.
If the server or the instance crashed
- depool the host from production, if possible (dbctl, haproxy, etc.). If it is not possible, weight the impact of availability vs the possibility of exposing bad or outdated data (e.g. cache db vs enwiki primary server)
- determine the root cause of the crash with os logs (syslog), hw logs (mgmt interface), etc.
- start the instance without replication starting automatically (
systemctl set-environment MYSQLD_OPTS="--skip-slave-start"
) - start mariadb
- check the error log
journalctl -u mariadb
(or mariadb@<section>) - do a table check comparing it to other host check (
db-compare
) to ensure all data is consistent between all servers of the same section- Most production hosts have a configuration that makes them be durable on crash (
innodb_flush_log_at_trx_commit=1
). However, not all kinds of crash can ensure consistency (e.g. HW RAID controller failure)
- Most production hosts have a configuration that makes them be durable on crash (
- If the server looks good, start replication and repool it into service
This page is a part of the SRE Data Persistence technical documentation
(go here for a list of all our pages)