News/2025 Cloud VPS VXLAN IPv6 migration
Material may not yet be complete, information may presently be omitted, and certain parts of the content may be subject to radical, rapid alteration. More information pertaining to this may be available on the talk page.
The Wikimedia Cloud VPS service is introducing some changes to how virtual networks are implemented, moving from the VLAN protocol to VXLAN, and adding support for IPv6 at the same time.
This change primarily affects Cloud VPS virtual machines (instances) created by users.
Other services like Toolforge or PAWS are unaffected by this change. Adding IPv6 support to Toolforge will definitely be a separate project at a later stage.
What is changing?
- When creating a new virtual machine, Horizon will show additional network options, with a sensible default that will be changing along the migration period.
- When creating a new virtual machine, there will be an option to attach them to an IPv6 network in dual-stack mode.
- Internally, the Openstack virtual network will start using the VXLAN protocol, and will stop using the VLAN protocol.
- DNS names for virtual machines running on dual-stack will get both IPv4 and IPv6 records.
The options available for creating virtual machines are:
VLAN/legacy
: the old network we want to get rid of. This will be available -and the default- during the initial stages of the migration.VXLAN/IPv4-only
: the new network, for virtual machines that don't want to use IPv6 yet. This will eventually become the default, after we gain confidence on how the migration is progressing.VXLAN/IPv6-dualstack
: the new network, with IPv6 dualstack support. This is the desired default, in the final stages of the migration.
Timeline
- Done 2024-XX-YY: Services and documentation are ready for the migration start.
- 2025-01-06: announcement about the transition. 3 network options are made available in horizon, with
VLAN/legacy
being the default. - 2025-03-10: (2 months later) option to create VMs in
VLAN/legacy
is disabled in horizon.VXLAN/IPv4-only
becomes the default, withVXLAN/IPv6-dualstack
remaining as an option. - 2025-12-01: (1 year later) we evaluate how the migration is progressing, and may take further actions for next migration steps.
- 2026-12-01: (2 years later) we expect no VMs in the legacy VLAN to exist. If some exist, we will evaluate what to do. The migration can be considered 'completed'.
- 20XX-XX-XX: (at some point TBD) we may want to disable
VXLAN/IPv4-only
VM creation options, or keep it only for special cases upon requests.
What should I do?
There are a number of things to ensure you are in the right track for this migration.
When creating a new virtual machine, select the right network option
If you create virtual machines using Horizon, you will be presented with additional network options.
- in normal circumstances, use the default option that you are presented with.
- if you want to adopt IPv6 for your new virtual machine, select the corresponding dualstack option.
In case of doubts, feel free to reach out for help.
Create and test virtual machines on the new network
When creating virtual machines on the new network, you may want to check how things behave regarding the network protocols and services you offer or consume.
Some examples:
- if you are reading Wikimedia Dumps from your virtual machine, double check that they work after migration to the new network setup.
- if you are using a web proxy to reach Cloud VPS servers from the internet, verify that it works as expected when you create a new virtual machine in the new network setup.
We don't anticipate you will find any problem, but if you do, please reach out for help.
Review and update security groups
Security groups for your project and virtual machines may need to be updated to explicitly enable IPv6 network flows.
Because the dual-stack nature of the IPv6 network, some security group rules may need to be duplicated for IPv4 and IPv6. Other rules are network protocol agnostic. There should be information on the Horizon panel about this.
This also includes any network policies you may have inside your own virtual machines. Verify they are ready to work with IPv6 if you are going to use dualstack networking.
What are the primary effects of moving to the new network setup?
New address
There are new addresses for the new network options.
Depending on the network option, your virtual machines will start using a new address from a different CIDR:
VLAN/legacy
IPv4 CIDR:172.16.0.0/21
VXLAN/IPv4-only
IPv4 CIDR:172.16.8.0/21
VXLAN/IPv6-dualstack
IPv4 CIDR:172.16.16.0/21
IPv6 CIDR:2a02:ec80:a000:1::/64
Network communication between instances using different addresses is possible, the only limiting factor being firewalling as described below.
New DNS records
If using the VXLAN/IPv6-dualstack
option, your virtual machine will get two DNS records:
user@my-virtual-machine:~$ host my-virtual-machine.my-project.eqiad1.wikimedia.cloud
my-virtual-machine.my-project.eqiad1.wikimedia.cloud has address 172.16.16.95
my-virtual-machine.my-project.eqiad1.wikimedia.cloud has IPv6 address 2a02:ec80:a000:1::353
user@my-virtual-machine:~$ host 172.16.16.95
95.16.16.172.in-addr.arpa domain name pointer my-virtual-machine.my-project.eqiad1.wikimedia.cloud .
user@my-virtual-machine:~$ host 2a02:ec80:a000:1::353
3.5.3.0.0.0.0.0.0.0.0.0.0.0.0.0.1.0.0.0.0.0.0.a.0.8.c.e.2.0.a.2.ip6.arpa domain name my-virtual-machine.my-project.eqiad1.wikimedia.cloud.
Network connections using IPv6
If using the VXLAN/IPv6-dualstack
option, your virtual machine will start using IPv6 to reach for network peers and services that are available over IPv6.
Example, imagine your virtual machine wants to reach for https://commons.wikimedia.org/w/api.php
.
user@my-virtual-machine:~$ host commons.wikimedia.org
commons.wikimedia.org is an alias for dyna.wikimedia.org.
dyna.wikimedia.org has address 208.80.154.224
dyna.wikimedia.org has IPv6 address 2620:0:861:ed1a::1
Because both your virtual machine and the destination endpoint have IPv6, the initial network connection will be attempted using IPv6.
If, for whatever reason, this initial IPv6 connection fails, the operating system of your virtual machine should automatically fallback to using IPv4 for that network connection.
Please note that for IPv4-only peers, no IPv6 connection will be attempted (i.e, if the DNS query returns no IPv6 address).
Firewall and network policies on IPv6
If using the VXLAN/IPv6-dualstack
option, your virtual machine will start using a new firewalling stack for IPv6, primarily security groups.
You should check and review them, and make sure they allow whatever network traffic you want. Remember, security groups are allow-list (so, everything not explicitly specified will be denied).
If you have additional network policies within your virtual machines, for example on nginx, apache, mysql, docker, podman, nftables, iptables, ferm, firewalld, or anything else, you should review them to make sure they are ready to work with IPv6.
Exposing internet services from your virtual machines over IPv6
If using the VXLAN/IPv6-dualstack
option, it will be possible for your virtual machine to start offering services across the internet over IPv6, without NAT being involved.
The only thing you need is to enable the desired port in the security group of your virtual machine. This is similar to what traditionally happens with IPv4 floating IPs.
Please note, Cloud Services Terms of Use still apply. This means, for example, that to expose HTTP/HTTPS services from your virtual machine you should be using a web proxy, so you don't have to deal with end user privacy.
Another example, you could expose the SSH port (TCP/22) of your instance directly to the internet, thus not needing a bastion to access it. We recommend you don't do this. Technically yes, you can do it. But using the bastion is likely a more secure, stable and robust setup for accessing your instances via SSH.
In a nutshell:
- don't expose HTTP/HTTPs or SSH services via direct IPv6 access, because there are shared facilities for them.
- feel free expose other TCP/UDP services via IPv6, always following ToU regarding privacy.
Common questions and problems
This migrations implies that I need to rebuild my virtual machine
Yes, this is true.
I don't want to rebuild my virtual machine, because I will lose data or configuration
Yes, we understand.
Rebuilding it today for the purpose of this migration is not mandatory. You may not do this migration explicitly, and wait for the next operating system upgrade.
Regarding data, you should strongly consider decoupling your data from your virtual machine. This can be done by adding disk space to your instance .
Regarding configuration, you should consider using a configuration management system, such as ansible, puppet, or others.
How are operating system upgrades related to this network migration?
The base operating system we use for virtual machines instances in Cloud VPS is Debian.
Usually, every couple of years there is a new Debian stable release. When that happens, we work proactively with the community to encourage migration from older Debian releases into the newer one. This implies deleting the old virtual machine instances and creating new ones.
Because you will most likely need to rebuild your instance in a couple of years maximum, you can pair this network migration with such rebuild, thus requiring just one rebuild instead of two.
However, there are some considerations to pairing the two migrations together, as explained below.
I want to do nothing today, and wait until the next operating system upgrade before doing this migration
Yes, this is possible, as explained above.
However, from a point of view, separating the two is likely easier for you.
For example, proactively adopting IPv6 will most likely make for a smoother transition, compared to if you do both a network transition and a operating system upgrade on the same day.
Separating the two migrations will allow you to focus on potential problems independently, making them easier to handle.
I want to do nothing, will anything break?
We have taken measures to ensure normal operations for virtual machines that don't migrate. We are not anticipating any breakage.
If you feel something has stopped working as it should, please reach out for help.
Are virtual machines attached to different network able to communicate with each other?
Yes, transparently.
If not, please reach out for help.
If I select IPv6 dualstack for my instance, Do I need to do anything else to get the IPv6 network working?
Our virtual machine instances are ready to work with IPv6. No special configuration is needed inside them for IPv6 to work.
However, you may want to review firewall policies as described above.
Are there floating IPs for IPv6 ?
No.
Floating IPs were useful to enable direct no-NAT internet ingress/egress network connectivity to virtual machine instances.
Our IPv6 addresses are global in scope, publicly routable everywhere in the internet.
If your virtual machine has IPv6, you can use security groups , and include a TCP or UDP port that will be directly reachable over IPv6, achieving a similar effect of having a floating IP.
I want to have a stable IPv6 address so I can offer services with a stable endpoint
You can use a DNS CNAME for that.
For example, if you have two virtual machines instances:
vm1.myproject.eqiad1.wikimedia.cloud
with IPv62a02:ec80:a000:1::beef
vm2.myproject.eqiad1.wikimedia.cloud
with IPv62a02:ec80:a000:1::cafe
You could create a DNS record of type CNAME with an FQDN like this:
myservice.svc.myproject.eqiad1.wikimedia.cloud
pointings to instancevm1.myproject.eqiad1.wikimedia.cloud
If later the other VM takes over the work, you can just change the DNS CNAME to
myservice.svc.myproject.eqiad1.wikimedia.cloud
pointings to instancevm2.myproject.eqiad1.wikimedia.cloud
If you configure your clients to use the service endpoint, you can change the instances back and forth without needing to change the clients.
TODO: there seems to be no docs in Wikitech on how to do this with horizon, so we cannot link it here.
I enabled IPv6 in my instances and now everything is slow or not responding at all
In IPv6-enabled systems, network connections are attempted first using IPv6. If they fail, they will be retried using IPv4.
A common problem with this approach is that, in certain scenarios the first IPv6 attempt could take a long time to be reported as failed, with a long timeout. If so, the IPv4 attempt will happen very late, and it may feel like the system is not working.
Known causes for this are:
- misconfigured filtering firewalls in either end of the connection. The firewall may block the IPv6 connection, and there could be a long timeout before the IPv4 attempt is started.
- misconfigured DNS resolution for either end of the connection. The DNS may report the wrong IPv6, and thus the initial IPv6 connection attempt will be done against a completely wrong endpoint. It could be non-existant, blocked by firewall, or whatever.
I created an instance with IPv6, but I would like to go back to IPv4-only
You have several options:
- temporarily disable IPv6 at the operating system level of your dualstack instance.
- permanently disable IPv6 at the operating system level of your dualstack instance.
- rebuild your virtual machine instance in the
VXLAN/IPv4-only
network.
To temporarily disable IPv6, run the following commands:
user@instance:~$ sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
user@instance:~$ sudo sysctl -w net.ipv6.conf.default.disable_ipv6=1
If you want to revert this, run the following commands:
user@instance:~$ sudo sysctl -w net.ipv6.conf.all.disable_ipv6=0
user@instance:~$ sudo sysctl -w net.ipv6.conf.default.disable_ipv6=0
To permanently disable IPv6, edit the file /etc/sysctl.conf
to add some configuration options:
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
Commands:
user@instance:~$ sudo nano /etc/sysctl.conf
user@instance:~$ sudo sysctl -p
If you want to revert this, remove the configuration options from /etc/sysctl.conf
and then run the following commands:
user@instance:~$ sudo sysctl -w net.ipv6.conf.all.disable_ipv6=0
user@instance:~$ sudo sysctl -w net.ipv6.conf.default.disable_ipv6=0
Why are you doing this migration?
There are a number of reasons, primarily:
- The old VLAN-based approach to the virtual network has severe limitations in how we can set up our virtualization farm in the physical datacenter.
- The new VXLAN-based approach will make it easier to introduce some additional features on Cloud VPS, like tenant networks.
- We wanted to deploy IPv6 in Cloud VPS, for a number of reasons, including:
- IPv6 is the modern technology for network connectivity, and the fact that we did not have it could be considered 'technical debt'.
- There have been a few reports of users having difficulties accessing Cloud VPS over IPv4, because limitations from their ISP, most likely related to CG-NAT.
- We had a small, limited IPv4 allocation pool dedicated for floating IPs. With IPv6, there is no need for floating IPs.
- There have been reports of our general egress IPv4 NAT address
nat.cloudgw.eqiad1.wikimediacloud.org
getting throttled, blocked, or somehow ratelimited. This was really inconvenient to some Cloud VPS users. The usage of this egress NAT will decrease as IPv6 adoption increases. - When we introduce IPv6 support in Toolforge at a later stage, we will be able to easily associate network traffic with each tool running on Toolforge.
After the migration period
We would like to keep working on a few roadmap items, including:
- full native IPv6 support for everything in Cloud VPS and other services, including proxies, wiki-replicas, PAWS, and others.
- full native IPv6 support for Toolforge, which should help us address a number of long standing challenges network-wise.
- enabling Cloud VPS tenant networks, which would allow a given project to design and establish its own internal virtual network topology, creating routers, network segments and such.