Jump to content

User:Elukey/Ops/JessieMigration

From Wikitech

Background

https://phabricator.wikimedia.org/T123711

Target hosts

  • mc1004.eqiad.wmnet
  • mc1005.eqiad.wmnet

Prerequisites

Procedure

Remove the host from the pools

Send a code review like https://gerrit.wikimedia.org/r/#/c/269378/

You will probably see ~25 minutes of errors in due to puppet upgrading and restarting nutcracker: https://logstash.wikimedia.org/#/dashboard/elasticsearch/memcached

http://graphite.wikimedia.org/render/?width=586&height=308&target=MediaWiki.edit.failures.session_loss.count

Disable the running services

ssh to mc1004.eqiad.wmnet and disable redis/memcached:

sudo service redis-instance-tcp_6379 stop
sudo service memcached stop

Prepare the re-installation

Start with Server Lifecycle#Reinstallation:

ssh mc1004.eqiad.wmnet
sudo -i puppet agent --disable

Then destroy the key on the puppet-master:

ssh palladium.eqiad.wmnet
sudo -i puppet cert clean mc1004.eqiad.wmnet

and the salt key:

ssh neodymium.eqiad.wmnet
sudo -i salt-key -d mc1004.eqiad.wmnet

Then ssh to the console and reboot the host:

ssh root@mc1004.mgmt.eqiad.wmnet

Follow Platform-specific documentation/Dell PowerEdge RN10#Reboot and boot from network then console to boot from PXE. This will re-install the host and it will bring you to the next step.

Post-installation

Server Lifecycle#Post-Install: Get puppet running

At this stage the host should be ready to go, and nutcracker should have picked it up automatically. Please verify the following metrics to ensure that everything is fine:

https://logstash.wikimedia.org/#/dashboard/elasticsearch/memcached

http://graphite.wikimedia.org/render/?width=586&height=308&target=MediaWiki.edit.failures.session_loss.count