Portal:Cloud VPS/Admin/OpenTofu
We use OpenTofu to manage some cluster-wide OpenStack resources using the code in gitlab:repos/cloud/cloud-vps/tofu-infra.
Source of truth
As of this writing, the tofu-infra repo is the source of truth for a number of admin-controlled openstack elements, for example (this list is likely outdated now):
- nova flavors
- neutron networks, subnets, routers, routers ports and security groups
- projects
- DNS zones, and some DNS records
Any changes to these items should be made via tofu-infra.
Usage
We are improving tofu workflows as we increase usage, and identify patterns, etc.
Automated workflow via cookbook
This cookbook-based procedure will do everything you need. Please note that by default the cookbook operates on all deployments/regions (i.e, eqiad1-r
and codfw1dev-r
).
To apply the latest configuration in the repository (main branch):
user@cloudcumin1001:~ $ sudo cookbook wmcs.openstack.tofu --apply
To plan the changes associated with a given gitlab MR (general case):
user@cloudcumin1001:~ $ sudo cookbook wmcs.openstack.tofu --no-dologmsg --plan --gitlab-mr 27
Only plan a MR on a given region (in case you need it):
user@cloudcumin1001:~ $ sudo cookbook wmcs.openstack.tofu --no-dologmsg --plan --gitlab-mr 27 --cluster-name codfw1dev
The full workflow including the gitlab MR is like this:
- produce a gitlab MR
- run the cookbook with
--gitlab-mr XXX --plan
to see the tofu plan - change and iterate on the gitlab MR as required
- if more changes, run the cookbook again to see the updated tofu plan
- once happy with the plan, merge the gitlab MR
- once the MR is merged, run the cookbook with
--apply
to apply the MR
Manual workflow
This procedure describes the steps required to run opentofu manually for a given openstack deployment.
- Log in to a cloudcontrol on the deployment you want to run tofu on
- Run Puppet agent (to pull latest changes from the Git repo)
$ cd /srv/tofu-infra
$ sudo tofu plan
$ sudo tofu apply
In case you need to manually test or debug a gitlab MR in a given server, for example in a particularly complex code change, you can do this:
user@cloudcontrol2004-dev:~ $ cd /srv/tofu-infra/
user@cloudcontrol2004-dev:/srv/tofu-infra $ sudo git checkout main ; sudo git fetch --force 'origin' 'merge-requests/118/head:mr-origin-118' ; sudo git checkout --force 'mr-origin-118'
user@cloudcontrol2004-dev:/srv/tofu-infra $ sudo tofu init
user@cloudcontrol2004-dev:/srv/tofu-infra $ sudo tofu plan
user@cloudcontrol2004-dev:/srv/tofu-infra $ sudo TF_LOG=trace tofu plan
Automated workflow via gitlab CI
There is some gitlab CI integration. But at the time of this writing, it just does basic validation and linting. Therefore gitlab CI is not very involved in the workflow at this very moment.
However, having a workflow automated via gitlab CI is probably the desired end state.
Setup
There's a dedicated service account that OpenTofu authenticates with. The password for this account is in cloudvps-tofu-admin-account
pwstore file.
That account has full OpenStack access to the default domain:
$ sudo wmcs-openstack role add --domain default --inherited --user tofuadmin admin
Then, the tofu
binary is actually a thin wrapper to load credentials from /etc/tofu.env
, something like this:
export AWS_ENDPOINT_URL_S3="https://object.codfw1dev.wikimediacloud.org"
export AWS_REGION="codfw1dev-r"
export AWS_ACCESS_KEY_ID="SOMETHING"
export AWS_SECRET_ACCESS_KEY="SOME-SECRET-KEY"
export OS_CLOUD="tofu"
export OS_REGION_NAME="codfw1dev-r"
export TF_VAR_cloudvps_region="codfw1dev-r"
cookbook setup notes
The wmcs.openstack.tofu
cookbook uses a gitlab token to be able to write comments to a merge request.
The token is valid for 1 year.
To regenerate:
- go to https://gitlab.wikimedia.org/groups/repos/cloud/-/settings/access_tokens
- generate a new token with
api
scope anddeveloper
role, valid for 1 year - put the token in the the puppetserver secret git repo, in file
hieradata/role/common/cluster/cloud_management.yaml
hiera keyprofile::wmcs::spicerack_config::gitlab_token
Known problems
We are in the initial stages of adopting this IaC solution for Cloud VPS. There are some rough edges that are documented here.
DNS records of type NS in root zone cannot be updated
When updating the NS recordset of a root zone, the openstack designate API won't let a POST action update the data. You will need to delete the record before running tofu, which will then create it from scratch.
See also:
Importing security group rules
We don't import security groups rules because it is too much work.
If you relocate security groups rule from the openstack DB into the tofu-infra repo, before the first tofu apply run you will need to delete all rules by hand. Then the tofu apply run will re-create then.
Horizon allows for a very convenient 'select all' security group rules, and then 'delete all' workflow, so this should be somewhat easy to do.
POTENTIAL OUTAGE NOTE: the time between you deleting the secrules via horizon and tofu recreating them could introduce downtime for services covered in the sec group.
Managing the default security group
For new projects natively created using tofu-infra, the setting manage_default_secgroup: true
is the default (implicit, so this is what you get if you don't specify false
). This means no special operations are required to track the default security group and its rules.
For existing projects, created previous to tofu-infra and imported later, if you want to track the default security group of a project (i.e, using manage_default_secgroup: true
), when running tofu apply for the first time you will need to delete all security group rules by hand. Then tofu will recreate them.
This is because we haven't set any logic to import security group rules, as it was an overwhelming amount of work.
Horizon allows for a very convenient 'select all' security group rules, and then 'delete all' workflow, so this should be somewhat easy to do.
See also T375111 - openstack: clarify default security group semantics.
POTENTIAL OUTAGE NOTE: the time between you deleting the secrules via horizon and tofu recreating them could introduce downtime for services covered in the sec group.
Vendored openstack provider
As of this writing, the tofu-infra repository contains a vendored copy of the opentofu openstack provider. The main reason for this, is that in the current setup, the servers that execute tofu plan
and tofu apply
(cloudcontrols) don't have direct internet access, so they cannot download the provider at execution time.
Having this binary blob stored in the git repository may be controversial. Other valid approaches that we could take, for example:
- using a HTTP_proxy
- storing the openstack provider in our tofu registry, in
terraform.wmcloud.org/registry/
. See also https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps - declare this is not a problem after all, and do nothing
In the current vendored model, if you want to update the version of the provider, do this in the tofu-infra repository, in your laptop:
- set the desired version in the
providers.tf
file - remove the
vendor-providers
directory - run the command
tofu init -backend=false -upgrade
- run the command
make
- send a MR with the changes
Unsupported resources
There are a number of unsupported resources in the openstack provider. Some of them are documented here:
- trove quotas, see https://github.com/gophercloud/gophercloud/issues/3208 and https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Trove#Adjusting_per-project_Trove_quotas
- neutron BGP dynamic routing, see https://github.com/terraform-provider-openstack/terraform-provider-openstack/issues/1807
- neutron default security group rules, see https://github.com/gophercloud/gophercloud/issues/3210
This means the resources mentioned here cannot be controlled via tofu-infra.
Data migration from openstack to tofu
Some handy commands to prepare data to migrate, import from openstack into tofu.
security group
This command will create a YAML that is _almost_ ready to copy-paste into the tofu-infra repo.
aborrero@cloudcontrol2004-dev:~ $ sudo wmcs-openstack security group show 1628b4ec-259e-4bdc-9d42-44c392ad3f02 -f yaml | egrep -v created_at\|null\|revision\|standard_attr_id\|tags\|updated_at\|^" id:"\|security_group_id\|^" tenant_id"\|^" project_id"\|shared\|stateful\|normalized_cidr\|"description: ''" | sed s/"- belongs_to_default_sg: false"/"-"/g | sed '/^-$/ {N;s/\n//;}' | sed s/"- "/"- "/g | sed '/^-/ {s/^/\n/;}' | sed s/^"id: "/"import_id: "/g | sed s/^"project_id: "/"project: "/g
description: allow port 80 from anywhere
import_id: 1628b4ec-259e-4bdc-9d42-44c392ad3f02
name: webserver
project: proxy-codfw1dev
rules:
- direction: ingress
ethertype: IPv4
port_range_max: 80
port_range_min: 80
protocol: tcp
remote_ip_prefix: 0.0.0.0/0
- direction: egress
ethertype: IPv4
- direction: ingress
ethertype: IPv4
port_range_max: 443
port_range_min: 443
protocol: tcp
remote_ip_prefix: 0.0.0.0/0
- direction: egress
ethertype: IPv6
See also
- T365696: Investigate how to run OpenTofu to manage Cloud VPS admin-only resources -- original setup ticket
- https://registry.terraform.io/providers/terraform-provider-openstack/openstack/latest/docs -- upstream documentation