MariaDB/Switch Datacenter

From Wikitech

The week before the switchover

  • 7 days before: no more maintenance on the database clusters.
  • 6 days before: Enable circular replication between eqiad and codfw.
    • This requires updating section_params in hieradata/common/profile/mariadb.yaml. E.g. gerrit:719168
  • In the new DC:
    • Check and disable GTID on primaries.
    • Check that all replicas have GTID enabled.
    • Check for disabled notifications (icinga)/silences (alertmanager).
    • Check that the query killers are installed and enabled.
    • Review MW weights, comparing them to the old DC.
    • Warm up the caches using queries from the old DC.

The day of the switchover

Before the switchover

  • Downtime all db primaries just before the switch, so that read-only alerts won't fire (T285803).

After the switchover

  • Manually fix parsercache hosts and x2 in tendril: T266723
  • Submit a puppet patch changing host-down alerting:
    • Background: gerrit:736415
    • Move profile::monitoring::is_critical: true from hieradata/role/<old dc>/mariadb/* to hieradata/role/<new dc>/mariadb/
    • Re-run puppet: sudo cumin 'A:db-core or A:db-parsercache' 'run-puppet-agent -q'

After the switchover

  • 2 days after: disable circular replication again, and update section_params in hieradata/common/profile/mariadb.yaml again. E.g. gerrit:721421