VRT System/Failover

From Wikitech
Jump to navigation Jump to search

VRTS has one active host (currently otrs1001) and one replica (vrts2001).


The host to failover to should be a proper VRTS replica, meaning:

  • is running the puppet role(vrts)
  • has the same files as the primary in /opt. There is currently a rsync setup and can be run using sudo /usr/bin/rsync --rsh /usr/local/sbin/sync-vrts-ssl-wrapper -av --progress rsync://otrs1001.eqiad.wmnet/vrts /opt/

Planned Failover

A planned failover means the old production instance is responding and working properly. The following steps are needed to failover to a new host:

  • Log in with an admin account to the VRTS dashboard and schedule new system maintenance for when you plan to do the failover. This can be done from Admin -> System Maintenance. This is important as one of the critical things we have to try and ensure during a failover is that no one is writing to the database. Maintenance mode ensures that only admins can login to the system and this goes a long way in reducing the number of people actively using the system and we can easily inform admins to not perform any critical tasks during a failover.
  • Prepare DNS patch: In the DNS repo, open the wmnet template and change the record that ticket points to. This is under the "misc services without multiple backends section".
  • Ensure your new host is listed as the active_host in the hieradata/role/common/vrts.yaml file. This will ensure that it points to the write database in eqiad. Since there are only two hosts, you can just invert the values of active_host and passive_host.

Unplanned Failover

An unplanned failover means the old production instance is not responding/lost.