Portal:Cloud VPS/Admin/DNS

From Wikitech
Jump to navigation Jump to search
See also: Portal:Cloud VPS/Infrastructure#dns

Private DNS

Within Cloud VPS, each instance has a name like <instancename>.<projectname>.eqiad.wmflabs. For historical reasons we also create <instancename>.eqiad.wmflabs DNS entries for each instance. This legacy behavior may be discontinued in the future. CloudVPS is not just one big flat zone in the same way production is, it is broken into tenants with different access restrictions.

There is a special private domain which is svc.eqiad.wmflabs. which is intended to hold service FQDNs not associated with virtual machines or a specific Cloud VPS project.

Public DNS

Public DNS (e.g. tools-login.wmflabs.org) is currently handled by labs-ns0 and labs-ns1 running pdns with designate.


Cloud VPS DNS is PowerDNS, backed by a database controlled by Designate.

When a new instance is created, a DNS A record like this is created in Designate (under the special noauth-project tenant which novaobserver cannot access?):

; <<>> DiG 9.11.3-1ubuntu1.2-Ubuntu <<>> novaproxy-01.project-proxy.eqiad.wmflabs @labs-ns0.wikimedia.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 21651
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available
; EDNS: version: 0, flags:; udp: 2800
;novaproxy-01.project-proxy.eqiad.wmflabs. IN A
novaproxy-01.project-proxy.eqiad.wmflabs. 60 IN	A
;; Query time: 91 msec
;; WHEN: Wed Nov 07 12:48:08 GMT 2018
;; MSG SIZE  rcvd: 85

The public DNS stuff lives under the wmflabsdotorg tenant:

krenair@shinken-02:~$ OS_PROJECT_ID=wmflabsdotorg openstack zone list
| id                                   | name                            | type    |     serial | status | action |
| 553ef162-add7-4a5c-b115-9cabca662746 | wmflabs.org.                    | PRIMARY | 1541584686 | ACTIVE | NONE   |
| ef825cf2-db2d-4480-ad33-2c19b0a188dc | wmflabsdotorg.wmflabs.org.      | PRIMARY | 1535835306 | ACTIVE | NONE   |
| 933a78c2-3d8d-4ee1-bbef-9ab30be5f972 | 128- | PRIMARY | 1535835306 | ACTIVE | NONE   |
| e07defe3-6d08-4f37-bd7d-1bdc1c45d7e8 | 56.15.185.in-addr.arpa.         | PRIMARY | 1541514722 | ACTIVE | NONE   |

When floating IPs are allocated and assigned, and DNS records pointed at them like so:

krenair@shinken-02:~$ OS_PROJECT_ID=wmflabsdotorg openstack recordset show wmflabs.org. ntp-01.wmflabs.org.
| Field       | Value                                |
| action      | NONE                                 |
| created_at  | 2018-11-02T20:34:18.000000           |
| description | time server for cloud instances      |
| id          | 5812d178-b7a8-4168-bce7-5e064b411f82 |
| name        | ntp-01.wmflabs.org.                  |
| records     |                          |
| status      | ACTIVE                               |
| ttl         | 3600                                 |
| type        | A                                    |
| updated_at  | None                                 |
| version     | 1                                    |
| zone_id     | 553ef162-add7-4a5c-b115-9cabca662746 |

Alex's dns-floating-ip-updater.py script creates something like this:

krenair@shinken-02:~$ OS_PROJECT_ID=wmflabsdotorg openstack recordset show 56.15.185.in-addr.arpa.
| Field       | Value                                                                     |
| action      | NONE                                                                      |
| created_at  | 2018-11-02T20:41:47.000000                                                |
| description | MANAGED BY dns-floating-ip-updater.py IN PUPPET - DO NOT UPDATE OR DELETE |
| id          | bad4939b-aaf3-4bef-8878-2351b8944b76                                      |
| name        |                                                 |
| records     | ntp-01.wmflabs.org.                                                       |
|             | ntp-01.cloudinfra.wmflabs.org.                                            |
| status      | ACTIVE                                                                    |
| ttl         | None                                                                      |
| type        | PTR                                                                       |
| updated_at  | 2018-11-02T21:01:45.000000                                                |
| version     | 2                                                                         |
| zone_id     | e07defe3-6d08-4f37-bd7d-1bdc1c45d7e8                                      |

and usually instance-$instance.$project.wmflabs.org records to make it obvious which IP is served by which host.

Restarting PowerDNS

this section was written for LDAP integration but may not be relevant with Designate (?)

PowerDNS copes very poorly with interruptions in ldap service. Anytime opendj restarts, pdns needs to be restarted as well. So, to refresh either service (ldap or dns):

   $ sudo service opendj restart (on nembus and/or neptunium)
   $ sudo service pdns restart (on virt1000 and labcontrol2001)

Managing records by hand

Designate records can be created by hand using the API or the CLI (or horizon).

Designate services

Designate has a lot of moving parts. On a given designate hosts, you will (or may) see the following things running:

  • designate-api: the REST endpoint for creating or deleting zones or records. Horizon and the commandline client talk directly to this.
  • designate-sink: listens (via rabbitmq) for nova notifications about instance creation or deletion; creates and deletes records under .eqiad.wmflabs as needed.
  • designate-mdns: a tiny dns server maintained by designate -- this exists mainly as a source of authority for xxfr sync requests with powerdns. xfr can only sync records and not create domains.
  • designate-pool-manager: since xxfr is only useful for syncing records, designate-pool-manager holds credentials to write directly to the pdns database; it uses those creds to insert or remove pdns domain records as needed.
  • designate-central: coordinates all of the above
  • designate-agent: a leftover service from an older version of designate that doesn't do anything but is somehow still installed by the .deb packges. If it's running, feel free to stop it; puppet doesn't ensure that it's either up or down.

Note that this suite of services gets renamed and/or refactored with almost every Designate release. The above list is for Designate Mitaka; the list is almost certainly different in later releases.

Initial designate/pdns node setup

A lot of the node config is puppetized, but there are several steps that need manual intervention. This is kind of a mess, and more of it should probably be moved into puppet at some point.

Initial puppet runs will complain about the inability to start mariadb. This is complicated on Stretch by the mariadb package being a bit broken. To resolve:

 root@cloudservices1004:ln -s /opt/wmf-mariadb101 /opt/wmf-mariadb10
 root@cloudservices1004:/opt# cd /opt/wmf-mariadb10
 root@cloudservices1004:/opt/wmf-mariadb10# ./scripts/mysql_install_db
   Installing MariaDB/MySQL system tables in '/srv/sqldata' ...

Once puppet runs are clean, create the initial pdns database and set up grants. First you'll need to find the socket file:

 root@cloudservices1004:/tmp# find . -name "mysql*"
 root@cloudservices1004:/tmp# mysql --socket systemd-private-2588877b841f451db29adcf4b35a2afa-mariadb.service-WEFqZB/tmp/mysql.sock
 MariaDB [(none)]> create database pdns;
 Query OK, 1 row affected (0.00 sec)
 MariaDB [(none)]> USE pdns;
 Database changed
 MariaDB [pdns]> GRANT ALL ON pdns.* TO 'pdns'@'localhost' IDENTIFIED BY '<password>';
 Query OK, 0 rows affected (0.00 sec)
 MariaDB [pdns]> GRANT ALL ON pdns.* TO 'pdns'@'<first designate node ipv4>' IDENTIFIED BY '<password>';
 Query OK, 0 rows affected (0.01 sec)
 MariaDB [pdns]> GRANT ALL ON pdns.* TO 'pdns'@'<second designate node ipv4>' IDENTIFIED BY '<password>';
 Query OK, 0 rows affected (0.01 sec)
 MariaDB [pdns]> GRANT ALL ON pdns.* TO 'pdns'@'<first designate node ipv6>' IDENTIFIED BY '<password>';
 Query OK, 0 rows affected (0.01 sec)
 MariaDB [pdns]> GRANT ALL ON pdns.* TO 'pdns'@'<second designate node ipv4>' IDENTIFIED BY '<password>';
 Query OK, 0 rows affected (0.00 sec)

The pdns-recursor will expect some ip alias files to exist. Those files are maintained by a timer, but to get things running right now, run


Then restart pdns-recursor.

If adding a new node (as opposed to rebuilding an existing one with a previously registered IP), you'll need to update the designate pool to include the new node. A template of the yaml pool config can be found in puppet: puppet/modules/openstack/files/mitaka/designate/eqiad1_pool_config.yml. Copy it onto an active designate nodes, edit as needed, and then:

 designate-manage pool update  --file ./eqiad1_pool_config.yml --dry_run true

and if everything parses properly,

   designate-manage pool update  --file ./eqiad1_pool_config.yml

Finally, whether this is a newly added node or a rebuild, we need designate to set up the pdns database schema. Online pdns guides will tell you that the pdns schema needs to be set up by hand, but designate will do it for us, manage version tracking, and insert some extra fields that designate needs to keep track of what it's managing.

 # designate-manage pool show_config
 #  # note the pool id output from this
 # designate-manage powerdns upgrade <pool id from the pool config file above>
 # designate-manage powerdns sync <pool id from the pool config file above>

Now we're ready to restart all designate services (and pdns) and see what's broken. Rebooting is easiest!

Now pdns will probably be complaining (in the syslog) about getting axfr requests for unknown domains. That's because designate doesn't automatically populate domains in pdns, it only creates them as needed when records are added. We have too many domains to wait, so the best way forward is to dump the 'domains' table from an existing node and import it on the new node.

 #  # on existing, working node:
 # mysqldump pdns domains > domains.sql
 #  # on new node
 # mysql --socket <whatever> pdns < domains.sql

Finally: designate-sink needs to be able to talk to the nova proxy, in order to clean up proxies associated with a deleted VM. Add access to port 5668 for the IP of the new node to project-proxy's security group. Similarly, when creating a new node make sure it can talk to the cloud puppetmasters on port 8101 so it can clean up the certs associated with a deleted VM.

New A record

Example command to create a new A record:

root@cloudcontrol1003:~# designate --all-tenants --os-project-name admin record-create 16.172.in-addr.arpa. \
>  --name --type PTR --data cloudinstances2b-gw.svc.eqiad.wmflabs. \
>  --description "Neutron virtual router. Record created by hand"
| Field       | Value                                          |
| description | Neutron virtual router. Record created by hand |
| type        | PTR                                            |
| created_at  | 2018-11-30T13:26:47.000000                     |
| updated_at  | None                                           |
| domain_id   | 6990e139-49e6-466c-9421-46cf45f05842           |
| priority    | None                                           |
| ttl         | None                                           |
| data        | cloudinstances2b-gw.svc.eqiad.wmflabs.       |
| id          | 855a3d4d-4caf-4652-b897-d6559a64bb4e           |
| name        |                       |

New PTR records

Example command of updating a PTR record:

root@cloudcontrol1003:~# designate --all-tenants --os-project-name admin record-create svc.eqiad.wmflabs. \
>  --name cloudinstances2b-gw.svc.eqiad.wmflabs. --type A --data --description "Neutron virtual router. Record created by hand"
| Field       | Value                                          |
| description | Neutron virtual router. Record created by hand |
| type        | A                                              |
| created_at  | 2018-11-30T13:23:02.000000                     |
| updated_at  | None                                           |
| domain_id   | 114f1333-c2c1-44d3-beb4-ebed1a91742b           |
| priority    | None                                           |
| ttl         | None                                           |
| data        |                                     |
| id          | a92d47c2-aec3-4777-ac9f-cb246b3c9e13           |
| name        | cloudinstances2b-gw.svc.eqiad.wmflabs.         |

Updating an existing record

Example command for updating an existing A record:

# Update A record:
# designate --all-tenants --os-project-name admin record-update --name tools.db.svc.eqiad.wmflabs. --type A --data --description "ToolsDB server. Record created by hand" df88fcb3-fbc2-42f1-bb12-2424c8b7117e 522952da-b43b-4032-bcb5-df04b6a1cdbc
# which means (uids translated):
# designate --all-tenants --os-project-name admin record-update --name tools.db.svc.eqiad.wmflabs. --type A --data --description "ToolsDB server. Record created by hand" db.svc.eqiad.wmflabs. tools.db.svc.eqiad.wmflabs.

TODO: verify me and use proper syntax highlighting.

Detecting leaked records

In some cases, leaked DNS records may happen. We have a custom script to detect/correct them: wmcs-novastats-dnsleaks. If this script is run with the --delete argument, it will delete leaked records, which is usefull if there are many of them.

root@cloudcontrol1003:~# wmcs-novastats-dnsleaks
A record for huggle-pg.huggle.eqiad.wmflabs. has multiple IPs: ['', '']
This needs cleanup but that isn't implemented and almost never happens.
a6cf149d-d90f-401c-a380-a83b65a69d79 is linked to missing instance cloudinstances2b-gw.admin.eqiad.wmflabs.
PTR e06fdc2d-2bcd-40ec-b416-ca4d63f0dce2 is linked to missing instance ci-jessie-wikimedia-1099005.contintcloud.eqiad.wmflab