Wikimedia DNS/Administration
Adding a new Wikidough host
The following text describes the steps involved in setting up a new Wikidough host. It is recommended that you read this document from start to end at least once to have a better understanding of the steps involved before you proceed to work on them.
1. Puppet Role
- Start by looking in operations/puppet: manifests/sites.pp to check the existing Wikidough hosts:
# Wikidough (T252132) node /^(doh[123456]00[12])\.wikimedia\.org$/ { role(wikidough) }
- If you want to deploy in eqiad, the new hostname will be doh1003 (based on the above).
We will use the hostname doh1003FIXME going forward for the rest of the documentation (the "FIXME" helps prevent erroneous copy-pastes).
- Create a patch for the above, adding the hostname doh1003FIXME to the regular expression in site.pp.
- Commit message: "site: add role for doh1003FIXME (Wikidough eqiad)"
- Example commit: https://gerrit.wikimedia.org/r/c/operations/puppet/+/757636.
- Submit and merge the patch.
2. acme-chief
acme-chief is used to issue the TLS certificates for the Wikidough hosts. When you add a new host, you have to add the hostname to acme-chief's config.
- In operations/puppet, under the wikidough section in hieradata/role/common/acme_chief.yaml, update the regex under authorized_regexes to add the new hosts.
- Commit message: "acme_chief: authorize doh1003FIXME host for Wikidough"
- Example commit: https://gerrit.wikimedia.org/r/c/operations/puppet/+/757628
- Submit and merge the patch.
3. Ganeti VM
The next step is to create a Ganeti VM.
- Start by creating a task in Phabricator (example: https://phabricator.wikimedia.org/T300156) with the following information, for transparency:
- Title: Create Ganeti VM for Wikidough in eqiad
- Body:
Specifications: Hostname: doh1003FIXME (eqiad) vCPUs: 2 Memory: 8 Disk: 15G Network: Public
- Add tags, Traffic and SRE.
- Create the task.
Now create the VM:
- Follow the process for the makevm cookbook documented at https://wikitech.wikimedia.org/wiki/Ganeti#Create_a_VM to create the Ganeti VM. Run the command below on a cumin host:
sudo cookbook sre.ganeti.makevm --vcpus 2 --memory 8 --disk 15 --network public eqiad_A doh1003FIXME
- If you are deploying multiple VMs, make sure to spread them across different rows.
- To know the rows of the existing hosts, run:
sudo gnt-node list -o name,group
from a Ganeti master node.
- To know the rows of the existing hosts, run:
- When the cookbook finishes, note the MAC address. If you missed that in the output, run:
sudo gnt-instance show doh1003FIXME.wikimedia.org | grep -A 2 NIC
- In operations/puppet, edit: modules/install_server/files/dhcpd/linux-host-entries.ttyS0-115200 and add the MAC address from above.
- Example commit: https://gerrit.wikimedia.org/r/c/operations/puppet/+/757633.
- Commit message: "install_server: add MAC address of doh1003FIXME"
- Submit and merge and then on the cumin host, finalize the change:
sudo cumin A:installserver 'run-puppet-agent'
- Ensure you have a working VM with the Wikidough role applied (such as having set the boot order to disk, signed Puppet certs,e tc.) before proceeding with the next steps.
- An all green on Icinga and no alerts on #wikimedia-operations is a good sign that everything is working as intended.
Possible Issues
- acme_chief errors and/or failures of dnsdist.service can be resolved by doing two consecutive Puppet agent runs so that the TLS certs are fetched and made available to the new host.
4. Homer and Anycast
- Now we will need to configure homer to complete the Wikidough anycast setup.
- If you haven't already done so, clone the homer repository from https://gerrit.wikimedia.org/g/operations/homer/public.
- Copy the IP of the doh1003FIXME host (not the VIP, so nothing in 185.71.138.0/24) and add it to config/sites.yaml under the relevant data center.
- Example commit: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/698971
- Get the patch reviewed from the netops team. Add: Arzhel and/or Cathal for the review. It's best to wait for their reviews before proceeding.
- Merge the patch and then on cumin, run:
sudo run-puppet-agent
To finalize the homer changes (continuing on cumin), run the following, replacing "cr*-eqiad*" with the name of the relevant data center:
homer "cr*-eqiad*" commit "Gerrit <REPLACE WITH HOMER CHANGE ID FROM ABOVE>: Set up BGP peering to doh1003FIXME in eqiad, triggering DoH /24 announcement there."
If you are running this in ulsfo, the above command will be:
homer "cr*-ulsfo*" commit "Gerrit <REPLACE WITH HOMER CHANGE ID FROM ABOVE>: Set up BGP peering to doh1003FIXME in ulsfo, triggering DoH /24 announcement there."
- Also log the above message in #wikimedia-operations for transparency.
- Review the output. Make sure that it matches the host IP you just added. Type "yes" to commit.
- You will need to type "yes" repeatedly, once each for all of the core routers.
If everything goes well, you should have a working Wikidough host.
Post-Setup Notes
- Add the new to Wikidough's integration test, knead-wikidough.
- Clone the knead-wikidough repository. In tests/test_dns.py, add to DOUGH_HOSTS, the new host IP.
- Commit the change.
- knead-wikidough will check for DoH and DoT settings against the new host and a successful CI check indicates that the new host is working as intended.
Restarting services
Multiple services make up the Wikimedia DNS stack. systemd ordering should automatically handle dependencies when restarting any of the services, but we still need to disable Puppet, initiate Icinga downtime, etc. To that end, we utilize a cookbook (sre.dns.roll-restart-wikimedia-dns) to restart the whole stack cleanly. Operation requires no custom parameters, so just run it like you would run any other cookbook.