Jump to content

Portal:Cloud VPS/Admin/OpenTofu

From Wikitech

We use OpenTofu to manage some cluster-wide OpenStack resources using the code in gitlab:repos/cloud/cloud-vps/tofu-infra.

Source of truth

As of this writing, the tofu-infra repo is the source of truth for a number of admin-controlled openstack elements, for example (this list is likely outdated now):

  • nova flavors
  • neutron networks, subnets, routers, routers ports and security groups
  • projects
  • DNS zones, and some DNS records

Any changes to these items should be made via tofu-infra.

Usage

We are improving tofu workflows as we increase usage, and identify patterns, etc.

Automated workflow via cookbook

See also T370414 - tofu-infra: create a cookbook automation to run tofu

This cookbook-based procedure will do everything you need. Please note that by default the cookbook operates on all deployments/regions (i.e, eqiad1-r and codfw1dev-r).

To apply the latest configuration in the repository (main branch):

user@cloudcumin1001:~ $ sudo cookbook wmcs.openstack.tofu --apply

To plan the changes associated with a given gitlab MR:

user@cloudcumin1001:~ $ sudo cookbook wmcs.openstack.tofu --gitlab-mr 27 --plan

Only plan a MR on a given region:

user@cloudcumin1001:~ $ sudo cookbook wmcs.openstack.tofu --gitlab-mr 27 --plan --cluster-name codfw1dev

Manual workflow

This procedure describes the steps required to run opentofu manually for a given openstack deployment.

  1. Log in to a cloudcontrol on the deployment you want to run tofu on
  2. Run Puppet agent (to pull latest changes from the Git repo)
  3. $ cd /srv/tofu-infra
  4. $ sudo tofu plan
  5. $ sudo tofu apply

Automated workflow via gitlab CI

See also T370652 - tofu-infra: introduce additional gitlab-ci automation

There is some gitlab CI integration. But at the time of this writing, it just does basic validation and linting. Therefore gitlab CI is not very involved in the workflow at this very moment.

However, having a workflow automated via gitlab CI is probably the desired end state.

Setup

There's a dedicated service account that OpenTofu authenticates with. The password for this account is in cloudvps-tofu-admin-account pwstore file.

That account has full OpenStack access to the default domain:

$ sudo wmcs-openstack role add --domain default --inherited --user tofuadmin admin

Then, the tofu binary is actually a thin wrapper to load credentials from /etc/tofu.env, something like this:

export AWS_ENDPOINT_URL_S3="https://object.codfw1dev.wikimediacloud.org"
export AWS_REGION="codfw1dev-r"
export AWS_ACCESS_KEY_ID="SOMETHING"
export AWS_SECRET_ACCESS_KEY="SOME-SECRET-KEY"
export OS_CLOUD="tofu"
export OS_REGION_NAME="codfw1dev-r"
export TF_VAR_cloudvps_region="codfw1dev-r"

cookbook setup notes

The wmcs.openstack.tofu cookbook uses a gitlab token to be able to write comments to a merge request.

The token is valid for 1 year.

To regenerate:

Known problems

We are in the initial stages of adopting this IaC solution for Cloud VPS. There are some rough edges that are documented here.

radosgw endpoint 500 error

Happens from time to time for unkonwn reasons. Just try the tofu operation again.

See also: https://phabricator.wikimedia.org/T360626

DNS records of type NS in root zone cannot be updated

When updating the NS recordset of a root zone, the openstack designate API won't let a POST action update the data. You will need to delete the record before running tofu, which will then create it from scratch.

See also:

Importing security group rules

We don't import security groups rules because it is too much work.

If you relocate security groups rule from the openstack DB into the tofu-infra repo, before the first tofu apply run you will need to delete all rules by hand. Then the tofu apply run will re-create then.

Horizon allows for a very convenient 'select all' security group rules, and then 'delete all' workflow, so this should be somewhat easy to do.

POTENTIAL OUTAGE NOTE: the time between you deleting the secrules via horizon and tofu recreating them could introduce downtime for services covered in the sec group.

Managing the default security group

If you want to track the default security group of a project (i.e, using manage_default_secgroup: true), when running tofu apply for the first time you will need to delete all security group rules by hand. Then tofu will recreate them.

This is because we haven't set any logic to import security group rules, as it was an overwhelming amount of work.

Horizon allows for a very convenient 'select all' security group rules, and then 'delete all' workflow, so this should be somewhat easy to do.

See also T375111 - openstack: clarify default security group semantics.

POTENTIAL OUTAGE NOTE: the time between you deleting the secrules via horizon and tofu recreating them could introduce downtime for services covered in the sec group.

Vendored openstack provider

As of this writing, the tofu-infra repository contains a vendored copy of the opentofu openstack provider. The main reason for this, is that in the current setup, the servers that execute tofu plan and tofu apply (cloudcontrols) don't have direct internet access, so they cannot download the provider at execution time.

Having this binary blob stored in the git repository may be controversial. Other valid approaches that we could take, for example:

In the current vendored model, if you want to update the version of the provider, do this in the tofu-infra repository, in your laptop:

  1. set the desired version in the providers.tf file
  2. remove the vendor-providers directory
  3. run the command tofu init -backend=false -upgrade
  4. run the command make
  5. send a MR with the changes

Unsupported resources

There are a number of unsupported resources in the openstack provider. Some of them are documented here:

This means the resources mentioned here cannot be controlled via tofu-infra.

Data migration from openstack to tofu

Some handy commands to prepare data to migrate, import from openstack into tofu.

security group

This command will create a YAML that is _almost_ ready to copy-paste into the tofu-infra repo.

aborrero@cloudcontrol2004-dev:~ $ sudo wmcs-openstack security group show 1628b4ec-259e-4bdc-9d42-44c392ad3f02 -f yaml | egrep -v created_at\|null\|revision\|standard_attr_id\|tags\|updated_at\|^"  id:"\|security_group_id\|^"  tenant_id"\|^"  project_id"\|shared\|stateful\|normalized_cidr\|"description: ''" | sed s/"- belongs_to_default_sg: false"/"-"/g | sed '/^-$/ {N;s/\n//;}' | sed s/"-  "/"- "/g | sed '/^-/ {s/^/\n/;}' | sed s/^"id: "/"import_id: "/g | sed s/^"project_id: "/"project: "/g
description: allow port 80 from anywhere
import_id: 1628b4ec-259e-4bdc-9d42-44c392ad3f02
name: webserver
project: proxy-codfw1dev
rules:

- direction: ingress
  ethertype: IPv4
  port_range_max: 80
  port_range_min: 80
  protocol: tcp
  remote_ip_prefix: 0.0.0.0/0

- direction: egress
  ethertype: IPv4

- direction: ingress
  ethertype: IPv4
  port_range_max: 443
  port_range_min: 443
  protocol: tcp
  remote_ip_prefix: 0.0.0.0/0

- direction: egress
  ethertype: IPv6

See also