Wikimedia Cloud Services team/EnhancementProposals/Decision record T332191 subdomain for new cloud private subnets

From Wikitech

Origin task: phab:T332191

Date of the decision: 2023-04-05

No decision meeting needed, people commenting in the task:

Decision taken

Option 3bis:

Use <vlan-shortname>.<dc>.wikimedia.cloud.

Examples:

  • cloudcontrol1003.private.eqiad.wikimedia.cloud
  • cloudlb2001-dev.private.codfw.wikimedia.cloud

Note: originally was option 2, but when implementing, decided to go with 3bis.

This is documented at Portal:Cloud_VPS/Admin/DNS.

Problem

As part of the latest project on network isolation, there are new per-rack VLANs subnets allocated called cloud-private. See https://phabricator.wikimedia.org/T324992#8671971.

The data is copied here for reference:

>
>"supernet": 172.20.0.0/16
>
>|Vlan Name|Vlan ID|Subnet|
>|-------------|---------|--------|
>|cloud-private-c8-eqiad|1151|172.20.1.0/24|
>|cloud-private-d5-eqiad|1152|172.20.2.0/24|
>|cloud-private-e4-eqiad|1153|172.20.3.0/24|
>|cloud-private-f4-eqiad|1154|172.20.4.0/24|
>|cloud-private-b1-codfw|2151|172.20.5.0/24|

These new IP addresses will be allocated and assigned per physical hardware host, in parallel to the traditional `10.x.y.z` addresses that we know and love for ssh/puppet/management, etc. The `10.x.y.z` addresses use the `<datacenter>.wmnet` naming, but since these new addresses are considered natively cloud realm (even though not virtual), we won't be using `wmnet`.

This decision request is to decide on the subdomain to use for them.

Our current ""policy"" for domain names is at https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/DNS, which should be updated with the results of this decision. As of today, the policy suggests that the domain we should use is `wikimedia.cloud`, because it replaced `eqiad.wmflabs` which was the cloud counterpart to `eqiad.wmnet`.


Constraints and risks

  • Make sure whatever domain we use makes it clear that they are HW servers and not virtual machines.
  • This wont be really exposed to end-users/customers, so we have a bit more freedom to pick one option and have second thoughts a couple of years later.
  • We already have some precedents in `enwiki.analytics.db.svc.wikimedia.cloud` FQDNs. They use the `svc` subdomain. Such subdomain is not very fitted for this case since these aren't service IP addresses.
  • The chosen subdomain must be hosted by wikiland DNS servers to avoid chicken-egg problems (the domain being unavailable because the cloud being down, but the config of some core cloud service relying on the FQDNs for startup)


Options

Option 1

Use `<dc>.wikimedia.cloud`.

Examples:

  • `cloudcontrol1003.eqiad.wikimedia.cloud`
  • `cloudlb2001-dev.codfw.wikimedia.cloud`

Pros:

  • simple and straight forward 'mirror' of the `<dc>.wmnet` scheme.

Cons:

  • in some cases may be too similar to VM FQDNs, like `whatever.project.eqiad1.wikimedia.cloud`.

Option 2

Use `<dc>.hw.wikimedia.cloud`.

Examples:

  • `cloudcontrol1003.eqiad.hw.wikimedia.cloud`
  • `cloudlb2001-dev.codfw.hw.wikimedia.cloud`

Pros:

  • Explicit `hw` keyword (meaning: hardware), should help clearly identify this is an IP in hardware and not on a virtual machine.

Cons:

  • Slightly longer to type.

Option 3

Use `<vlan>.wikimedia.cloud`

Examples:

  • `cloudcontrol1003.cloud-private-c8-eqiad.wikimedia.cloud`
  • `cloudlb2001-dev.cloud-private-b1-codfw.wikimedia.cloud`

Pros:

  • Extra clear what this is about, as it hardcodes in an explicit fashion the DC, the rack and the vlan name.

Cons:

  • Long and complex to type.
  • If a host is relocated into a different rack, the FQDN will need to be updated, making them less time-stable than other options.

Option 3bis

Use `<vlan-shortname>.<dc>.wikimedia.cloud`

Examples:

  • `cloudcontrol1003.private.eqiad.wikimedia.cloud`
  • `cloudlb2001-dev.private.codfw.wikimedia.cloud`

Pros:

  • Extra clear what this is about, as it hardcodes in an explicit fashion the DC, and the vlan [short] name.

Cons:

  • none!