Dumps/SQL-XML Dumps/Swapping NFS servers

From Wikitech

Sometimes you may want to swap the primary and fallback NFS servers for the XML/SQL dumps. If they are both operational, here is the procedure used to do it.

  • Make sure dumps are idle and that all data from /data/xmldatadumps/public is current on the fallback server
  • Stop puppet on both hosts
  • Stop the dumps-rsyncer service on the primary host (systemctl stop dumps-rsyncer.service)
  • Check for any /usr/local/bin/rsync..*sh process on the primary and shoot it if there is one.
  • Remove the unit file for the rsyncer service outright: it's likely in /usr/lib/systemd rather than in /etc/systemd, a find will turn it up
  • Rsync the /data/xmldatadumps/private directory from the primary to the fallback host
    You might do a dry run first, doing something like rsync -av --dry-run --itemize-changes /data/xmldatadumps/private/ dumpsdata1001.eqiad.wmnet::data/xmldatadumps/private/ (adjust for hostnames, check that the path is still right, etc)
  • Rsync the /data/temp directory (used for testing) to the fallback host; you will need to edit /etc/rsyncd.conf on the fallback host to not exclude **temp
    As with the private directory, you may want to do a dry run first to make sure your paths are right.
  • Disable puppet on the xml/sql snapshot hosts (NOT the one running misc dumps)
  • Stop the dumps-monitor service on the snapshot running it (systemctl stop dumps-monitor.service)
  • Swap the role of the primary and secondary in puppet:
    • Swap the hiera/host yaml files
    • Swap the settings for the two hosts in hieradata/common.yaml and profile/dumps.yaml making one the nfs server and one an internal server to get copies
    • Do a grep to make sure there's nothing else naming the two servers in the puppet repo that you might need to swap around
    • Comment out the cron jobs for root and the dumpsgen users on the two hosts (puppet will put them back correctly), OTHERWISE YOU MAY HAVE DUPS, VERY BAD
    • Remove /data/xmldatdumps/public/dumpstatusfiles.tar.gz on both dumpsdata hosts.
    • Do a permissions change on the new primary, just in case an unnoticed interrupted rsync left things in a weird state:
      from /data/xmldatadumps/public dir, chown -R dumpsgen:dumpsgen *wik*
      from the same dir, chmod a+r .
  • Enable puppet on the old primary, run it, make sure it does not start up the rsyncer shell script and that the cron jobs look right (compare to the ones you commented out on the fallback)
  • Enable puppet on the old fallback, run it, make sure the rsyncer shell script is started and that the cron jobs look right (compare to the ones you commented out on the old primary)
  • Fix up the snapshot hosts:
    • Make sure no job of the dumpsgen user is running on the snapshot
    • Umount the /mnt/dumpsdata nfs share
    • Enable and run puppet
    • Make sure that /mnt/dumpsdata now comes from the new primary host
    • On the snapshot with the monitor, make sure the monitor has started up
  • Finally: on the snapshot testbed, try a test run of a small wiki, for example (as the dumpsgen user, make sure you use the TESTS configfile):
    • python3 ./worker.py --configfile /etc/dumps/confs/wikidump.conf.tests --exclusive --log dewikiversity
  • Update Dumps/Dumpsdata hosts with the new server info