User:Elukey/Ops/JobQueue

From Wikitech
Procedure:
    
    * Reroute writes to another redis https://gerrit.wikimedia.org/r/276105
    * Remove the server from the jobqueues config https://gerrit.wikimedia.org/r/276106
    :* salt -C 'G@cluster:jobrunner and G@site:eqiad' cmd.run 'puppet agent -t'
    :* salt -C 'G@cluster:jobrunner and G@site:eqiad' cmd.run 'restart jobchron' 
    :* salt -C 'G@cluster:videoscaler and G@site:eqiad' cmd.run 'puppet agent -t'
    :* salt -C 'G@cluster:videoscaler and G@site:eqiad' cmd.run 'restart jobchron'
    * Stop the redises; copy the rdb files off the disks
    * Reimaging
    * Copy the rdb files back before a full puppet run
    * run puppet, verify the keys are there in redis
    * Revert the two changes and deploy them
    
    Useful commands to check status:
    1) Connections left in the Redis Job Queue:
         elukey@neodymium:~$ sudo -i salt -t 120 rdb1003* cmd.run "netstat -tuap | grep ESTAB | awk '{print $5}' | grep mw | cut -d: -f1 | sort | uniq -c"
    2) Check presence of connections to rdb1003 among the job runners
    elukey@neodymium:~$ sudo salt -C 'G@cluster:jobrunner and G@site:eqiad' cmd.run 'netstat -tunap | grep 10.64.0.201'
         elukey@neodymium:~$ sudo salt -C 'G@cluster:videoscaler and G@site:eqiad' cmd.run 'netstat -tunap | grep 10.64.0.201'