RIPE Atlas

From Wikitech

The RIPE Atlas is a distributed network monitoring project to measure reachability and latency. There are two device types, Probes and Anchors. Probes are small, USB-powered appliances, while Anchors are 1U rack mounted equipment. Probes and Anchors test connectivity to remote Anchors and DNS root servers, and report their results to the Atlas website. WMF hosts 4 Anchors.

The Atlas has been used to measure things like AAAA filtering, DNS root server reachability, and Internet routing response to hurricanes: https://atlas.ripe.net/results/analyses/

In addition to the stats you can get from RIPE's site, we track some statistics of our own: https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1

Run tests from the command line

Atlas has a suite of command line tools to interact with its API. On "cluster management" production hosts (cumin1002.eqiad.wmnet, cumin2002.codfw.wmnet) SRE has the tools installed and they can be accessed when running as the 'atlas' user. All tools are also aliased with the correct sudo invocation for convenience, for example running a ssl certificate test from 99 italian probes:

cumin1001:~$ source /etc/ripeatlas.alias # load sudo aliases
cumin1001:~$ asslcert --target text-lb.esams.wikimedia.org --from-country it --probes 99 --no-report

Looking good!  Your measurement was created and details about it can be found here:

  https://atlas.ripe.net/measurements/22900971/

cumin1001:~$

Country latency measurement

latency-measurement can be used to automate the measurement of latency of each country to the various WMF servers.

Anchor setup

RIPE NCC doc: https://atlas.ripe.net/docs/howtos/installing-vm-anchor.html

Tracked in: https://phabricator.wikimedia.org/T307021

  1. If not already present, add the sandbox vlan to Netbox, the switch/router, the hypervisors and Puppet (network/data/data.yaml). See other sites like esams for example config.
  2. On a Cumin host: create the VM with those parameters: sudo cookbook sre.ganeti.makevm --vcpus 2 --memory 2 --disk 50 --network sandbox --os none --cluster XXX --group YYY atlasZZZZ
  3. In Netbox, edit the newly created VM and set its tenant to "RIPE NCC".
  4. On the primary hypervisor: Enable SPICE for that VM
  5. On the primary hypervisor: Start the VM: sudo gnt-instance start -H boot_order=cdrom,cdrom_image_path=/tmp/anchor.iso atlasZZZZ.wikimedia.org
  6. Connect to the hypervisor indicated by the error message (eg. "Hypervisor parameter validation failed on node ganeti3007.esams.wmnet") and download the Atlas image: https_proxy=http://webproxy:8080 wget https://ks.atlas.ripe.net/misc/anchor.iso -O /tmp/anchor.iso
  7. Setup the SPICE port forwarding
  8. Start the VM for real this time with the above command.
  9. Quickly (as there is a grub timeout) in the SPICE window, select the manual networking config
  10. Setup the IP config based on the IPs allocated to the VM in Netbox. NOTE THAT THE BACKSPACE KEY DOESN'T WORK SO GOOD LUCK
  11. Once the installer is running set the boot order back to disk: sudo gnt-instance modify --hypervisor-parameters=boot_order=disk atlasZZZZ.wikimedia.org
  12. Delete the previously downloaded image (rm /tmp/anchor.iso)
  13. Run the https://netbox.wikimedia.org/extras/scripts/capirca.GetHosts/ Netbox script
  14. Run Homer on the core routers