Jump to content

Network cheat sheet

From Wikitech

This document is about working on the Juniper devices used in the Wikimedia Infrastructure.

SSH access to network equipment

Junipers take ssh keys. Huzzah! WMF routers and switches follow the Infrastructure naming conventions.

For example, the hostnames of eqiad core routers are cr1-eqiad.wikimedia.org and cr2-eqiad.wikimedia.org:

ssh cr1-eqiad.wikimedia.org

Access switches are named asw-${rownum}-${dc}.mgmt.${dc}.wmnet. Hence, row B switches in eqiad and codfw can be accessed as follows:

ssh asw-b-eqiad.mgmt.eqiad.wmnet
ssh asw-b-codfw.mgmt.codfw.wmnet

When connected to one member of a SRX cluster, to connect to the second node run:

request routing-engine login node <node#>

Juniper

How to block an IP address

On the core routers add the ipv4/ipv6 address to the blackhole4/blackhole6 prefix list e.g. to block 192.0.2.1 and 2001:db8::1 use the following

$ ssh cr1-eqiad.wikimedia.org 
jbond@re0.cr1-eqiad> edit
jbond@re0.cr1-eqiad# set policy-options prefix-list blackhole4 192.0.2.1/32
jbond@re0.cr1-eqiad# set policy-options prefix-list blackhole6 2001:db8::1/128
jbond@re0.cr1-eqiad# show | compare
jbond@re0.cr1-eqiad# commit and-quit

Operational mode vs Configuration mode

Juniper devices can be used in two ways:

  • Operational mode (default when logging in):
{master}
elukey@re0.cr2-eqiad>
  • Configuration mode (to apply network configuration changes):
elukey@re0.cr2-eqiad> edit
Entering configuration mode

{master}[edit]
elukey@re0.cr2-eqiad#

Juniper 101 from a IRC session with Faidon:

11:53 <paravoid> there are two modes in the cli
11:53 <paravoid> the operational mode and the configuration mode
11:54 <paravoid> when you first login you enter the operational one
11:54 <paravoid> so "show interfaces ae3" shows you the state of the interface 
                 (link speed, physical link etc.)
11:54 <paravoid> and "show bgp summary" shows you the BGP summary etc.
11:55 <paravoid> and a few other commands not in the show hierarchy like
                 "request routing-engine login" and whatnot
11:56 <paravoid> to view the config in the operational mode you do "show configuration ..."
11:56 <paravoid> if you want to edit the config, you enter the config mode
11:56 <paravoid> by typing "edit"
11:56 <paravoid> (and leave it with "quit" or "exit")
11:56 <paravoid> once you're there, "show" does something entirely different
11:57 <paravoid> it basically does what "show configuration" does in the operational mode
11:57 <paravoid> so "show interfaces ae3" will show you the config section for interface ae3
11:57 <paravoid> and "show" will do the same as "show configuration" in the operational mode
11:58 <paravoid> so the config now
11:59 <paravoid> there are two ways of viewing it (and editing it, but that's more complicated)
11:59 <paravoid> one is the hierarchical view, the other one is set
11:59 <paravoid> the hierarchical is the thing you see with "show"
11:59 <paravoid> system { domain-name ...; services { ssh { root-login allow; } } }
12:00 <paravoid> set is the thing you see with "| display set"
12:00 <paravoid> in the config mode, you can navigate the hierarchy with edit
12:00 <paravoid> so while you're in there (having typed "edit")
12:00 <paravoid> you can type
12:00 <paravoid> "edit system"
12:01 <paravoid> and then you're only under the system part of the hierarchy
12:01 <paravoid> so "show", no arguments, will show you only what's under sysetm
12:01 <paravoid> and "show services" will show everything that's under "system { services { ... } }"
12:02 <paravoid> similarly you can go deeper by typing "edit services",
                 or if you're at the root "edit system services"
12:02 <paravoid> same with set
12:02 <paravoid> "set" takes a relative path
12:03 <paravoid> so in your case, you can do

# The chat was about editing the analytics-in4 input filter
# (rules for all the ports in the Analytics VLAN)

12:03 <paravoid> set firewall family inet filter analytics-in4 term mysql from destination-address 10.64.37.14/32
12:03 <paravoid> or
12:03 <paravoid> edit firewall family inet filter analytics-in4
12:03 <paravoid> set term mysql from destination-address 10.64.37.14/32
12:04 <paravoid> (or any other combination)
12:05 <paravoid> oh and you can navigate the other way out of the hierarchy
                 by typing "up" or if you want to go to the root with "top"
12:05 <paravoid> "| display set" shows the set command from the root
12:05 <paravoid> so if you're in "edit firewall family inet filter analytics-in4"
                 and type "show | display set" or "show term mysql | display set"
                 you can paste the output as it is.

Rollbacks

It is always useful to know basic rollback procedures while operating on any service, a mistake can happen and being ready to revert a change is surely a good know-how.

Logging in Edit mode, the rollback ? command shows the most recent list of commits:

--- JUNOS 13.3R9.13 built 2016-03-01 07:03:30 UTC
{master}
elukey@re0.cr1-eqiad> edit
Entering configuration mode

{master}[edit]
elukey@re0.cr1-eqiad# rollback ?
Possible completions:
  <[Enter]>            Execute this command
  0                    2017-02-09 17:03:58 UTC by elukey via cli commit synchronize
  1                    2017-02-08 18:54:10 UTC by bblack via cli commit synchronize
  2                    2017-02-08 17:40:02 UTC by elukey via cli commit synchronize
  3                    2017-02-08 15:24:44 UTC by elukey via cli commit synchronize
  4                    2017-02-03 17:57:52 UTC by filippo via cli commit synchronize
  [..]

The most recent commit is numbered as 0, 1 is the one happened right before it, etc..

Two of the most rollback use cases are:

  • Changes that are not going to be committed due to some issues (for example, show | compare does not return the expected outcome). In this case, you'd want to clear whatever change done with rollback 0 (a sort of git reset).
  • Changes already committed that caused issues and need to be reverted. In this case the faulty commit should be the last one (number 0) and you'd want to rollback to the last known good state before it. In this case, rollback X (with X == number) undo all the differences between the last and X commit. Please note that you'll need to commit after the rollback!

Show diff between two commits

elukey@asw-a-eqiad> show system rollback compare 1 0
[edit interfaces interface-range vlan-private1-a-eqiad]
+    member ge-2/0/23;
[edit interfaces]
+   ge-2/0/23 {
+       description db1107;
+       enable;
+   }

Edit ACLs for Network ports

We apply ACLs on the router's network ports to filter inbound traffic via Juniper's input filters. Please note that in this case inbound traffic is from the port's point of view, not from what it is attached to it (like a switch or a host). So every input filter that we apply to a specific port (or set of ports) filters traffic coming to the router's port.

Real use case scenario: allow every host in the Analytics VLAN to connect to dbproxy1010.eqiad.wmnet on port 3306.

# Random host belonging to the Analytics VLAN:
# analytics1034.eqiad.wmnet

# Find the port used to reach analytics1034.eqiad.wmnet
elukey@re0.cr1-eqiad> show route analytics1034.eqiad.wmnet
[..]
10.64.36.0/24      *[Direct/0] 22w5d 22:31:05
                    > via ae3.1022
                    
# Check ACLs applied to the port
show configuration interfaces ae3.1022
elukey@re0.cr1-eqiad> show configuration interfaces ae3.1022
description "Subnet analytics1-c-eqiad";
vlan-id 1022;
family inet {
    filter {
        input analytics-in4;
    }
[..]

# Check the input filter
show configuration firewall family inet filter analytics-in4
[..]
term mysql {
    from {
        destination-address {
            10.X.Y.Z/32;
            [..]
        }
        protocol tcp;
        destination-port 3306;
    }
    then accept;
}

# Add dbproxy1010's IP to the mysql term list
# This must be done in "edit" mode
elukey@re0.cr1-eqiad> edit
Entering configuration mode

{master}[edit]
elukey@re0.cr1-eqiad# set firewall family inet filter analytics-in4 term mysql from destination-address 10.64.37.14/32

# Then commit with a meaningful message and quit
commit comment "Added dbproxy1010 to the analytics-in4 input filter"

# Do the same with dbproxy1010's port if necessary, re-appliying this procedure.

Another similar use case is removing a "destination-port" from a term in a input filter:

# Check the input filter
show configuration firewall family inet filter analytics-in4
[..]
term mysql {
    from {
        destination-address {
            10.X.Y.Z/32;
            [..]
        }
        protocol tcp;
        destination-port [ 3306 8000 ];
    }
    then accept;
}

# Remove port 8000 from destination-port
delete firewall family inet filter analytics-in4 term mysql from destination-port 8000

If you want to add a comment to a IP:

elukey@re0.cr1-eqiad# edit firewall family inet filter analytics-in4 term aqs from destination-address
{master}[edit firewall family inet filter analytics-in4 term aqs from destination-address]
elukey@re0.cr1-eqiad# show
10.64.0.107/32;
10.64.32.138/32;
10.64.48.146/32;
elukey@re0.cr1-eqiad# annotate 10.64.0.199/32 aqs100{4,5,6}
elukey@re0.cr1-eqiad# show
/* aqs100{4,5,6} */
10.64.0.107/32;
10.64.32.138/32;
10.64.48.146/32;

Change an access switch port's VLAN

For example let's try to change druid1004's port from the production VLAN to the related Analytics VLAN id tag:

# Find the correct switch
elukey@druid1004:~$ sudo lldpcli show neighbors | grep SysName
    SysName:      asw-a-eqiad

# Find the host's port on the switch
elukey@asw-a-eqiad> show interfaces descriptions | match druid1004
ge-6/0/12       up    up   druid1004

# Find the current VLAN settings
elukey@asw-a-eqiad> show ethernet-switching interfaces | match ge-6/0/12
ge-6/0/12.0  up     private1-a-eqiad    1017  untagged unblocked

# Remove the private1-a-eqiad tag and add the new one
delete interfaces interface-range vlan-private1-a-eqiad member ge-6/0/12
set interfaces interface-range vlan-analytics1-a-eqiad ge-6/0/12

Note: The above would not work for a trunk port as a trunk port supports more than 1 VLANs (tagged and untagged, albeit only up to 1 untagged VLAN)

# Verify, commit, etc..

A nice gotcha is that the changes will need to be performed only on the switches ports, not on cr1/cr2 routers. The reason for this is that cr1/cr2 will always send the traffic to the switches tagged (the ports are trunks). An indirect way to verify this is:

elukey@re0.cr1-eqiad> show route druid1004.eqiad.wmnet

inet.0: 655563 destinations, 3322224 routes (655426 active, 0 holddown, 139 hidden)
Restart Complete
+ = Active Route, - = Last Active, * = Both

10.64.0.0/22       *[Direct/0] 59w5d 19:35:18
                    > via ae1.1017

akosiaris@re0.cr1-eqiad> show configuration interfaces ae1.1017 
description "Subnet private1-a-eqiad";
vlan-id 1017;

Note that it's NOT necessary for the VLAN id and the interface unit to be the exact same number (e.g. 1017), but not doing so is a recipe for confusion.

In the above case the /22 prefix is routed to the ae1.1017 interface, and outbound traffic gets properly tagged. So the only thing that needs to be done beforehand is changing the IP address assigned to the host, to be one belonging to the new VLAN id (easily findable via operations/dns).

Matching hosts with rack numbers

To find out which cache hosts are connected on codfw's row c:

ema@asw-c-codfw> show interfaces descriptions | match cp 
xe-2/0/3        up    up   cp2013
xe-2/0/4        up    up   cp2014
xe-2/0/5        up    up   cp2015
xe-7/0/3        up    up   cp2016
xe-7/0/4        up    up   cp2017
xe-7/0/5        up    up   cp2018

Interfaces names, reported in the first column, follow Juniper's interfaces naming convention. The first part of the interface name, xe in the examples above, is the media type. xe stands for 10 Gigabit Ethernet interface, other options would have been ge for Gigabit Ethernet and et for 40 Gigabit Ethernet. The second part is the FPC, which allows us to find out the specific rack number to with the host is connected. The first three hosts (cp2013, cp2014 and cp2015) are on c2 (xe-2), while cp2016, cp2017 and cp2018 are on c7 (xe-7). The last number represents the port number.

Racktables also allows to check the mapping between racks and hostnames.

Check reboot or downtime

Sometimes a lot of hosts in the same rack loose connectivity all together, it might be due to switch failure/reboot.

T159464 is an example: all the Rack A1 hosts were marked as DOWN in Icinga, so we checked the logs on asw-a-codfw :

elukey@asw-a-codfw> show system uptime
fpc1:
--------------------------------------------------------------------------
Current time: 2017-03-02 17:39:56 UTC
System booted: 2017-03-02 16:55:13 UTC (00:44:43 ago)
Last configured: 2017-03-02 16:58:04 UTC (00:41:52 ago) by root
 5:39PM  up 45 mins, 0 users, load averages: 0.06, 0.06, 0.06

fpc2:
--------------------------------------------------------------------------
Current time: 2017-03-02 17:39:57 UTC
System booted: 2015-08-04 15:05:30 UTC (82w2d 02:34 ago)
[..]


asw-a1-codfw (fpc1) seems to have rebooted around 16:55, so we might want to check system logs:

elukey@asw-a-codfw> show log messages | match 16:5[45]

{master:2}

Logs can rotate and might not be displayed in the main messages logfile:

show log messages.0.gz | match 16:5[45]

Mar  2 16:54:03  asw-a-codfw vccpd[1635]: VCCPD_PROTOCOL_ADJDOWN: Lost adjacency to dc38.e1d4.1b00 on vcp-255/0/48.32768,
Mar  2 16:54:03  asw-a-codfw vccpd[1635]: interface vcp-255/0/48 went down
Mar  2 16:54:03  asw-a-codfw fpc3 [EX-BCM PIC] ex_bcm_linkscan_handler: Link 54 DOWN
Mar  2 16:54:03  asw-a-codfw vccpd[1635]: Member 2, interface vcp-255/0/48.32768 went down
[..]

Check what host owns a specific IPv6 address

This is necessary if the PTR DNS record is not present. You can use a combination of the NDP and ARP protocols:

elukey@re0.cr1-eqiad> show ipv6 neighbors
[..search the IP that you want..]
2620:0:861:103:92b1:1cff:fe28:d448
                             90:b1:1c:28:d4:48  stale       254 no  no      ae3.1019

elukey@re0.cr1-eqiad> show arp no-resolve | match 90:b1:1c:28:d4:48
90:b1:1c:28:d4:48 10.64.32.64     ae3.1019                 none

Operational mode commands

show ethernet-switching table           # shows mac addresses
show ethernet-switching table interface # shows mac addresses for that interface
show ethernet-switching table vlan      # shows mac addresses for vlan
show interfaces descriptions 
Interface       Admin Link Description
ge-1/0/0        up    up   ms1001
show interfaces terse                   # shows interfaces with ip's in a very short format
show interface ge-1/0/0 (extensive)     # shows interfaces in more detail
monitor interface xe-1/1/0              # shows interface in a real-time updating mode (errors, bits, etc)
show log messages | last 20             # shows log with info
show virtual-chassis vc-port statistics member 6 vcp-255/1/0 extensive  # show VC link info
show virtual-chassis vc-port statistics extensive | match "fpc|port|errors" # Show VC link errors of a whole stack
show route receive-protocol bgp <IP>   # shows prefixes a peer has announced

Config commands

Junipers configure after you confirm - you can configure and then double check

  • configure - puts you in config mode
  • exit - takes you up one level (or out of) config mode
  • top - takes you to the top level of config mode
  • show - shows you configuration below that level
    • show XXX | display inheritance - Show the content of the groups applied to this level if any. (Juniper's equivalent of an "include").

MX204 (and similar) upgrade

https://www.juniper.net/documentation/en_US/junos/topics/concept/installation_upgrade.html

Previous tests show a downtime of ~5min for the vmhost reboot.

Linux

here's some useful stuff i use to diagnose network problems:

  • IP addresses
[root@ariel root]# ip addr
1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
4: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:50:45:5b:d0:8c brd ff:ff:ff:ff:ff:ff
    inet 207.142.131.244/26 brd 207.142.131.255 scope global eth0
    inet6 2001:470:1f01:367:250:45ff:fe5b:d08c/64 scope global deprecated dynamic
       valid_lft 1897645sec preferred_lft -89555sec
    inet6 fe80::250:45ff:fe5b:d08c/64 scope link
       valid_lft forever preferred_lft forever
5: eth1: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:50:45:5b:d0:8d brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.2/16 brd 10.0.255.255 scope global eth1
    inet6 fe80::250:45ff:fe5b:d08d/64 scope link
       valid_lft forever preferred_lft forever
6: sit0: <NOARP> mtu 1480 qdisc noop
    link/sit 0.0.0.0 brd 0.0.0.0

iproute2 is much better than ifconfig; example:

ip addr add 10.0.0.3 dev eth0
....
ip addr del 10.0.0.3 dev eth0

with no need for eth0:X virtual interfaces

ip route add default via 10.0.0.4
  • packet sniffing; tcpdump:
tcpdump -s 1520 -lni eth0 host bacon and port not 22

(exclude port 22 to avoid seeing your own traffic)

  • or tethereal (nicer for some things, can decode http requests, mysql data streams etc):
tethereal -s 1520 -ni eth0 host bacon and port not 22
  • arping: like ping but uses ARP instead of ICMP (gets through firewall on the LAN)Operational