Portal:Cloud VPS/Admin/notes/Neutron Migration
Appearance
This page contains historical information. This page was only relevant while we were migrating from nova-network to neutron
Clearly a lot of this can be automated. Once we've done a few projects without errors we can mash all this into one big super-script
Steps
- Disable the project in the eqiad region and enable it in eqiad1. This will prevent users from creating new VMs in the old region.
- On labcontrol1001, migrate quotas and security groups.
root@labcontrol1001:~# cd ~
root@labcontrol1001:~# pwd
/root
root@labcontrol1001:~# cd /root
root@labcontrol1001:~# source ~/novaenv.sh
root@labcontrol1001:~# wmcs-region-migrate-quotas <project-name>
Updated quotas using <QuotaSet cores=12, fixed_ips=200, floating_ips=0, injected_file_content_bytes=10240, injected_file_path_bytes=255, injected_files=5, instances=8, key_pairs=100, metadata_items=128, ram=24576, security_group_rules=20, security_groups=10, server_group_members=10, server_groups=10>
root@labcontrol1001:~# wmcs-region-migrate-security-groups <project-name>
deleting rule {u'remote_group_id': u'2c908284-84ef-4a4a-8f1a-11e84b6256db', u'direction': u'ingress', u'protocol': None, u'description': u'', u'ethertype': u'IPv4', u'remote_ip_prefix': None, u'port_range_max': None, u'security_group_id': u'2c908284-84ef-4a4a-8f1a-11e84b6256db', u'port_range_min': None, u'tenant_id': u'hhvm', u'id': u'6ea1ac47-3876-4676-bb01-bf89cb6f4363'}
deleting rule {u'remote_group_id': u'2c908284-84ef-4a4a-8f1a-11e84b6256db', u'direction': u'ingress', u'protocol': None, u'description': u'', u'ethertype': u'IPv6', u'remote_ip_prefix': None, u'port_range_max': None, u'security_group_id': u'2c908284-84ef-4a4a-8f1a-11e84b6256db', u'port_range_min': None, u'tenant_id': u'hhvm', u'id': u'85eabca5-63f4-43f2-aeb2-64d2e70779f1'}
Updating group default in dest
copying rule: {u'from_port': None, u'group': {u'tenant_id': u'hhvm', u'name': u'default'}, u'ip_protocol': None, u'to_port': None, u'parent_group_id': 357, u'ip_range': {}, u'id': 1526}
copying rule: {u'from_port': -1, u'group': {}, u'ip_protocol': u'icmp', u'to_port': -1, u'parent_group_id': 357, u'ip_range': {u'cidr': u'0.0.0.0/0'}, u'id': 1527}
copying rule: {u'from_port': 22, u'group': {}, u'ip_protocol': u'tcp', u'to_port': 22, u'parent_group_id': 357, u'ip_range': {u'cidr': u'10.0.0.0/8'}, u'id': 1528}
copying rule: {u'from_port': 5666, u'group': {}, u'ip_protocol': u'tcp', u'to_port': 5666, u'parent_group_id': 357, u'ip_range': {u'cidr': u'10.0.0.0/8'}, u'id': 1529}
- Start 'screen' because the next bit is going to take a while
root@labcontrol1001:~# screen
- Get a list of all VMs in the project
root@labcontrol1001:~# OS_TENANT_NAME=<project-name> openstack server list
+--------------------------------------+------------------+--------+--------------------+
| ID | Name | Status | Networks |
+--------------------------------------+------------------+--------+--------------------+
| d4730c86-a6cc-4cb1-9ebe-a84f26926f24 | hhvm-jmm-vp9 | ACTIVE | public=10.68.19.57 |
| 34522cd3-9628-4035-9faa-6d12e55b0f9f | hhvm-stretch-jmm | ACTIVE | public=10.68.20.46 |
| db3a0098-8707-49bd-846f-9b9629c63658 | hhvm-jmm | ACTIVE | public=10.68.16.91 |
+--------------------------------------+------------------+--------+--------------------+
- Migrate VMs one by one
root@labcontrol1001:~# wmcs-region-migrate d4730c86-a6cc-4cb1-9ebe-a84f26926f24
- See what broke
Special Concerns for Kubernetes Nodes
When moving a Kubernetes worker (or anything that connects to the flannel network for that matter), you must reload ferm on every flannel etcd node (currently that means tools-flannel-etcd-0[1-3].tools.eqiad.wmflabs). After that, run puppet on the worker node to put everything to rights.
That said, don't forget that some worker nodes still have a broken image that has a bad resolve.conf. Do check that.
Therefore the process for moving a worker node is:
- drain and cordon
- move with
wmcs-region-migrate
- fix resolve.conf if needed
sudo systemctl reload ferm
on tools-flannel-etcd-0[1-3]- run puppet
- uncordon after validating the node is "Ready" in
kubectl get nodes
Common issues
- Puppet not working because certificate issues. Run sudo rm -rf /var/lib/puppet/ssl in the instance and then run sudo puppet agent -t -v again.