User:Elukey/Ops/JobQueue
Appearance
Procedure: * Reroute writes to another redis https://gerrit.wikimedia.org/r/276105 * Remove the server from the jobqueues config https://gerrit.wikimedia.org/r/276106 :* salt -C 'G@cluster:jobrunner and G@site:eqiad' cmd.run 'puppet agent -t' :* salt -C 'G@cluster:jobrunner and G@site:eqiad' cmd.run 'restart jobchron' :* salt -C 'G@cluster:videoscaler and G@site:eqiad' cmd.run 'puppet agent -t' :* salt -C 'G@cluster:videoscaler and G@site:eqiad' cmd.run 'restart jobchron' * Stop the redises; copy the rdb files off the disks * Reimaging * Copy the rdb files back before a full puppet run * run puppet, verify the keys are there in redis * Revert the two changes and deploy them Useful commands to check status: 1) Connections left in the Redis Job Queue: elukey@neodymium:~$ sudo -i salt -t 120 rdb1003* cmd.run "netstat -tuap | grep ESTAB | awk '{print $5}' | grep mw | cut -d: -f1 | sort | uniq -c" 2) Check presence of connections to rdb1003 among the job runners elukey@neodymium:~$ sudo salt -C 'G@cluster:jobrunner and G@site:eqiad' cmd.run 'netstat -tunap | grep 10.64.0.201' elukey@neodymium:~$ sudo salt -C 'G@cluster:videoscaler and G@site:eqiad' cmd.run 'netstat -tunap | grep 10.64.0.201'