Procedure:
* Reroute writes to another redis https://gerrit.wikimedia.org/r/276105
* Remove the server from the jobqueues config https://gerrit.wikimedia.org/r/276106
:* salt -C 'G@cluster:jobrunner and G@site:eqiad' cmd.run 'puppet agent -t'
:* salt -C 'G@cluster:jobrunner and G@site:eqiad' cmd.run 'restart jobchron'
:* salt -C 'G@cluster:videoscaler and G@site:eqiad' cmd.run 'puppet agent -t'
:* salt -C 'G@cluster:videoscaler and G@site:eqiad' cmd.run 'restart jobchron'
* Stop the redises; copy the rdb files off the disks
* Reimaging
* Copy the rdb files back before a full puppet run
* run puppet, verify the keys are there in redis
* Revert the two changes and deploy them
Useful commands to check status:
1) Connections left in the Redis Job Queue:
elukey@neodymium:~$ sudo -i salt -t 120 rdb1003* cmd.run "netstat -tuap | grep ESTAB | awk '{print $5}' | grep mw | cut -d: -f1 | sort | uniq -c"
2) Check presence of connections to rdb1003 among the job runners
elukey@neodymium:~$ sudo salt -C 'G@cluster:jobrunner and G@site:eqiad' cmd.run 'netstat -tunap | grep 10.64.0.201'
elukey@neodymium:~$ sudo salt -C 'G@cluster:videoscaler and G@site:eqiad' cmd.run 'netstat -tunap | grep 10.64.0.201'