User:Elukey/Ops/JessieMigration
Background
https://phabricator.wikimedia.org/T123711
Target hosts
- mc1004.eqiad.wmnet
- mc1005.eqiad.wmnet
Prerequisites
- Log in #ops
- Schedule three hours of downtime for the two hosts in Icinga
- Make sure that https://gerrit.wikimedia.org/r/#/c/268311 has been merged
Procedure
Remove the host from the pools
Send a code review like https://gerrit.wikimedia.org/r/#/c/269378/
You will probably see ~25 minutes of errors in due to puppet upgrading and restarting nutcracker: https://logstash.wikimedia.org/#/dashboard/elasticsearch/memcached
Disable the running services
ssh to mc1004.eqiad.wmnet and disable redis/memcached:
sudo service redis-instance-tcp_6379 stop
sudo service memcached stop
Prepare the re-installation
Start with Server Lifecycle#Reinstallation:
ssh mc1004.eqiad.wmnet
sudo -i puppet agent --disable
Then destroy the key on the puppet-master:
ssh palladium.eqiad.wmnet
sudo -i puppet cert clean mc1004.eqiad.wmnet
and the salt key:
ssh neodymium.eqiad.wmnet
sudo -i salt-key -d mc1004.eqiad.wmnet
Then ssh to the console and reboot the host:
ssh root@mc1004.mgmt.eqiad.wmnet
Follow Platform-specific documentation/Dell PowerEdge RN10#Reboot and boot from network then console to boot from PXE. This will re-install the host and it will bring you to the next step.
Post-installation
Server Lifecycle#Post-Install: Get puppet running
At this stage the host should be ready to go, and nutcracker should have picked it up automatically. Please verify the following metrics to ensure that everything is fine:
https://logstash.wikimedia.org/#/dashboard/elasticsearch/memcached