Jump to content

MariaDB/Upgrading a section

From Wikitech

Rolling restarts are used to perform security updates or upgrade minor or major MariaDB versions.

See the Kernel and MariaDB version dashboard: https://zarcillo.wikimedia.org/ui/hosts

Monitor kernel update progress with: https://grafana.wikimedia.org/d/fcnrmzq/mariadb-kernel-versions

Monitor MariaDB update progress with: https://grafana.wikimedia.org/d/bd9fc9e2-6bb5-463d-a783-87c59d23b6f1/mariadb-versions

Using the dbtools scripts to update/reboot multiple hosts

The dbtools scripts repository contains rolling_restart.py and similar scripts

Updating the s section

Use rolling_restart.py. The script performs a rolling restart of database hosts while also updating the OS. It updates MariaDB as part of the OS update according to Puppet's configuration.

The script handles, in order:

  1. Silencing of alerts
  2. Depooling
  3. Updating OS including MariaDB (if a new version is present and configured in puppet)
  4. Rebooting (switches to a new kernel if found)
  5. Checking replication lag (except for RO `es*` sections)
  6. Repooling
  7. Enabling alerts

Usage:

  • Check out its repository and auto_schema in a dedicated directory on a cumin host
  • Edit rolling_restart.py global variables
  • Run in screen/tmux/byobu as:
sudo PYTHONPATH=../auto_schema python3 rolling_restart.py --run

Updating the es section

For the read only hosts:

# See:
./rolling_restart_es.py -h

# Example:
sudo ./rolling_restart_es.py -t T419961 -r'Security updates' 6.1.164 check

Updating the ms and pc sections

# See:
./rolling_restart_pc_ms.py -h

# Example:
sudo ./rolling_restart_pc_ms.py -t T419961 -r'Security updates' 12:6.1.164 13:6.12.74 reboot ms


Order of upgrades

  • Upgrade clouddb* hosts.
  • Upgrade Sanitarium hosts in both DCs
  • Upgrade Sanitarium primaries in both DCs and ensure sanitarium host hangs from the 10.4 one in the active DC
  • Upgrade the candidate master on the standby DC
  • Upgrade the backup source in the standby DC (coordinate with Jaime)
  • Upgrade the master in the standby DC
  • Upgrade the candidate master in the primary DC
  • Upgrade the backup source in the primary DC (coordinate with Jaime)
  • Switchover the primary host in the primary DC to a Buster+10.4 host
  • Upgrade the old primary and make it a candidate primary


Upgrading MariaDB minor version on a single host

Use the sre.mysql.upgrade cookbook.

Upgrading MariaDB major version on a single host

Use the sre.mysql.major-upgrade cookbook.


Upgrade procedure (legacy)

  • Patch the dhcp file: [example]
  • Run puppet on install1003 and install2003
  • Depool the host (if needed) using software/dbtools/depool-and-wait
  • Silence the host in Icinga (e.g. on a cumin host, cookbook sre.hosts.downtime xxxx.wmnet -D1 -t TXXXXXX -r "reimage for upgrade - TXXXXXX")
  • Stop MySQL on the host
  • Run umount /srv; swapoff -a
  • Run reimage: sudo -E sudo cookbook sre.hosts.reimage xxxx.wmnet -p TXXXXXX
  • Wait until the host is up
  • Run systemctl set-environment MYSQLD_OPTS=”--skip-slave-start”
  • Run chown -R mysql. /srv/*; systemctl start mariadb ; mysql_upgrade
  • Run systemctl restart prometheus-mysqld-exporter.service
  • Dropped the host from Tendril and re-add it, otherwise they won’t get updated on tendril metrics
  • Check all the tables before starting replication (this can take up to 24h depending on the section)
    • In a screen run: mysqlcheck --all-databases
    • If any corruption is discovered, fix it with the following: journalctl -xe -u mariadb | grep table | grep Flagged | awk -F "table" '{print $2}' | awk -F " " '{print $1}' | tr -d "\`" | uniq >> /root/to_fix ; for i in `cat /root/to_fix`; do echo $i; mysql -e "set session sql_log_bin=0; alter table $i engine=InnoDB, force"; done
  • Start the replica
  • Wait until the host is up
  • Repool the host.

Accepts a Cumin query to match one or more hosts and:

  1. Depools the host using the sre.mysql.depool cookbook.
  2. Downtimes it
  3. Stops replication
  4. Upgrades packages including MariaDB using apt-get dist-upgrade
  5. Reboots it
  6. Starts MariaDB and runs mysql_upgrade then restarts replication
  7. Optionally repools it if --repool was passed
  8. Logs in Phabricator


Previous manual steps (obsoleted)

  1. You should log that a maintenance is about to happen:
    !log Upgrade db1111 T123456
    
  2. The package for the mariadb server must be upgraded, usually:
    sudo apt upgrade 'wmf-mariadb*'
    
    where wmf-mariadb* is the package version you want to upgrade to, e.g. wmf-mariadb104, for WMF's version of MariaDB 10.4 WMFf package is built thinking to avoid side effects- so it won't automatically try to stop, restart or alter in any way a running instance- so it is possible to run it an any time, even if a previous version is currently executing. But unless there is a reason for it (e.g. minimizing upgrade downtime) it should probably ran after all current instances are shutdown.
  3. Start mysql in a safe way- not starting replication automatically and removing any old buffer pool dump:
    sudo systemctl set-environment MYSQLD_OPTS="--skip-slave-start"
    <for each datadir> sudo mv ib_buffer_pool ib_buffer_pool.bak
    
  4. mysql_upgrade must be ran on every instance after startup, and before replication starts, for single instance hosts:
    systemctl start mariadb
    systemctl status mariadb  # check it started correctly (it is ok to have some errors on first start up due to ongoing upgrade, due to old table formats)
    mysql_upgrade
    
    For multiple instance hosts, for each instance:
    sudo systemctl start mariadb@<section>
    sudo systemctl status mariadb@<section>  # check it started correctly (it is ok to have some errors on first start up due to ongoing upgrade, due to old table formats)
    sudo mysql_upgrade -S /run/mysqld/mysqld.<section>.sock
    
    Where section is the list of instances to upgrade on that host (e.g. s1 and s2, x1, s5 and s4, etc.)
  5. After upgrade, if the mysql database changed, it is important to perform a reboot. This is normally skippable for minor upgrades, but guarantees it started with the right formatting:
    sudo systemctl restart mariadb # or sudo systemctl restart mariadb@<section> (for each section upgraded)
    

The rest of steps to get the server into production state would be the same as on a regular reboot/restart: MariaDB/Rebooting_a_host (restart replication, repool, reenable monitoring, safety checks)