Portal:Cloud VPS/Admin/notes/Neutron Migration

This page contains historical information. This page was only relevant while we were migrating from nova-network to neutron

Clearly a lot of this can be automated. Once we've done a few projects without errors we can mash all this into one big super-script

Steps

Disable the project in the eqiad region and enable it in eqiad1. This will prevent users from creating new VMs in the old region.
- Example: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/460360/
On labcontrol1001, migrate quotas and security groups.

root@labcontrol1001:~# cd ~
root@labcontrol1001:~# pwd
/root

root@labcontrol1001:~# cd /root
root@labcontrol1001:~# source ~/novaenv.sh 
root@labcontrol1001:~# wmcs-region-migrate-quotas <project-name>
Updated quotas using <QuotaSet cores=12, fixed_ips=200, floating_ips=0, injected_file_content_bytes=10240, injected_file_path_bytes=255, injected_files=5, instances=8, key_pairs=100, metadata_items=128, ram=24576, security_group_rules=20, security_groups=10, server_group_members=10, server_groups=10>
root@labcontrol1001:~# wmcs-region-migrate-security-groups <project-name>
deleting rule {u'remote_group_id': u'2c908284-84ef-4a4a-8f1a-11e84b6256db', u'direction': u'ingress', u'protocol': None, u'description': u'', u'ethertype': u'IPv4', u'remote_ip_prefix': None, u'port_range_max': None, u'security_group_id': u'2c908284-84ef-4a4a-8f1a-11e84b6256db', u'port_range_min': None, u'tenant_id': u'hhvm', u'id': u'6ea1ac47-3876-4676-bb01-bf89cb6f4363'}
deleting rule {u'remote_group_id': u'2c908284-84ef-4a4a-8f1a-11e84b6256db', u'direction': u'ingress', u'protocol': None, u'description': u'', u'ethertype': u'IPv6', u'remote_ip_prefix': None, u'port_range_max': None, u'security_group_id': u'2c908284-84ef-4a4a-8f1a-11e84b6256db', u'port_range_min': None, u'tenant_id': u'hhvm', u'id': u'85eabca5-63f4-43f2-aeb2-64d2e70779f1'}
Updating group default in dest
copying rule: {u'from_port': None, u'group': {u'tenant_id': u'hhvm', u'name': u'default'}, u'ip_protocol': None, u'to_port': None, u'parent_group_id': 357, u'ip_range': {}, u'id': 1526}
copying rule: {u'from_port': -1, u'group': {}, u'ip_protocol': u'icmp', u'to_port': -1, u'parent_group_id': 357, u'ip_range': {u'cidr': u'0.0.0.0/0'}, u'id': 1527}
copying rule: {u'from_port': 22, u'group': {}, u'ip_protocol': u'tcp', u'to_port': 22, u'parent_group_id': 357, u'ip_range': {u'cidr': u'10.0.0.0/8'}, u'id': 1528}
copying rule: {u'from_port': 5666, u'group': {}, u'ip_protocol': u'tcp', u'to_port': 5666, u'parent_group_id': 357, u'ip_range': {u'cidr': u'10.0.0.0/8'}, u'id': 1529}

Start 'screen' because the next bit is going to take a while

root@labcontrol1001:~# screen

Get a list of all VMs in the project

root@labcontrol1001:~# OS_TENANT_NAME=<project-name> openstack server list
+--------------------------------------+------------------+--------+--------------------+
| ID                                   | Name             | Status | Networks           |
+--------------------------------------+------------------+--------+--------------------+
| d4730c86-a6cc-4cb1-9ebe-a84f26926f24 | hhvm-jmm-vp9     | ACTIVE | public=10.68.19.57 |
| 34522cd3-9628-4035-9faa-6d12e55b0f9f | hhvm-stretch-jmm | ACTIVE | public=10.68.20.46 |
| db3a0098-8707-49bd-846f-9b9629c63658 | hhvm-jmm         | ACTIVE | public=10.68.16.91 |
+--------------------------------------+------------------+--------+--------------------+

Migrate VMs one by one

root@labcontrol1001:~# wmcs-region-migrate d4730c86-a6cc-4cb1-9ebe-a84f26926f24

See what broke

Special Concerns for Kubernetes Nodes

When moving a Kubernetes worker (or anything that connects to the flannel network for that matter), you must reload ferm on every flannel etcd node (currently that means tools-flannel-etcd-0[1-3].tools.eqiad.wmflabs). After that, run puppet on the worker node to put everything to rights.

That said, don't forget that some worker nodes still have a broken image that has a bad resolve.conf. Do check that.

Therefore the process for moving a worker node is:

drain and cordon
move with wmcs-region-migrate
fix resolve.conf if needed
sudo systemctl reload ferm on tools-flannel-etcd-0[1-3]
run puppet
uncordon after validating the node is "Ready" in kubectl get nodes

Common issues

Puppet not working because certificate issues. Run sudo rm -rf /var/lib/puppet/ssl in the instance and then run sudo puppet agent -t -v again.