Jump to content

RIPE Atlas

From Wikitech

The RIPE Atlas is a distributed network monitoring project to measure reachability and latency. There are two device types, Probes and Anchors. Probes are small, USB-powered appliances, while Anchors are 1U rack mounted equipment. Probes and Anchors test connectivity to remote Anchors and DNS root servers, and report their results to the Atlas website. WMF hosts 4 Anchors.

The Atlas has been used to measure things like AAAA filtering, DNS root server reachability, and Internet routing response to hurricanes: https://atlas.ripe.net/results/analyses/

In addition to the stats you can get from RIPE's site, we track some statistics of our own: https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 and https://grafana.wikimedia.org/d/lU6QJQJnk/atlas-worldmap

Run tests from the command line

Atlas has a suite of command line tools to interact with its API. On "cluster management" production hosts (cumin1002.eqiad.wmnet, cumin2002.codfw.wmnet) SRE has the tools installed and they can be accessed when running as the 'atlas' user. All tools are also aliased with the correct sudo invocation for convenience, for example running a ssl certificate test from 99 italian probes:

cumin1001:~$ source /etc/ripeatlas.alias # load sudo aliases
cumin1001:~$ asslcert --target text-lb.esams.wikimedia.org --from-country it --probes 99 --no-report

Looking good!  Your measurement was created and details about it can be found here:

  https://atlas.ripe.net/measurements/22900971/

cumin1001:~$

Country latency measurement

latency-measurement can be used to automate the measurement of latency of each country to the various WMF servers.

Anchor setup

RIPE NCC doc: https://atlas.ripe.net/docs/howtos/installing-vm-anchor.html


Tracked in: https://phabricator.wikimedia.org/T307021

  1. Request an Anchor on https://atlas.ripe.net/anchors/apply/
  2. If not already present, add the sandbox vlan to Netbox, the switch/router, the hypervisors and Puppet (network/data/data.yaml). See other sites like esams for example config.
    • For routed Ganeti, make sure the sandbox range is configured on Ganeti and not as a vlan (cf. T402372, T403580)
  3. On a Cumin host: create the VM with those parameters: sudo cookbook sre.ganeti.makevm --vcpus 2 --memory 2 --disk 50 --network sandbox --os none --cluster XXX --group YYY atlasZZZZ
  4. In Netbox, edit the newly created VM and set its tenant to "RIPE NCC".
  5. On the primary hypervisor: Enable SPICE for that VM
  6. On the primary hypervisor: find which nodes are the primary and secondary for that VM :sudo gnt-instance info atlasZZZZ.wikimedia.org
  7. On the primary and secondary nodes download the Atlas image: https_proxy=http://webproxy:8080 wget https://<custom url provided by RIPE>.iso -O /tmp/anchor.iso
  8. Start the VM: sudo gnt-instance start -H boot_order=cdrom,cdrom_image_path=/tmp/anchor.iso atlasZZZZ.wikimedia.org
  9. Use SPICE to press enter at the GRUB screen
  10. Once the installer is running set the boot order back to disk: sudo gnt-instance modify --hypervisor-parameters=boot_order=disk atlasZZZZ.wikimedia.org
  11. Delete the previously downloaded image (rm /tmp/anchor.iso)

Runbooks

HTTP checks failing

  1. check if the anchor replies to pings
    • If the anchor reply to pings (over v4 and v6) the issue is most likely with RIPE.
    • If the anchor doesn't reply to ping
      • If a VM
        • Check that the Ganeti cluster is healthy, and the VM is running sudo gnt-instance <fqdn>
        • If it's not running, try to start it manually once sudo gnt-instance start <fqdn>
      • If codfw (baremetal anchor)
        • Open a DCops task to powercycle the anchor

In all cases, open a low priority I/f task as well.