This page contains historical information. It may be outdated or unreliable.

At WMF

Goals

Due to the way HTTPS sessions are terminate, we will use IPsec to encrypt traffic between the caching proxy (Varnish) nodes in cache data centers and their counterparts in our main sites.

Alternatives

Q: Does Varnish support HTTPS backends?
A: No, and it probably never will.

-

Q: Could Varnish use stunnel or pound to communicate with second-tier Varnishes in main colos?
A: It's possible, but not preferred because these solutions operate in userspace and their scalability and resilience are not as trusted as IPsec which is implemented in the Linux kernel.

-

Q: What about using a hardware solution in the routers?
A: This is seen as an expensive solution which also may be less scalable because fewer devices are encrypting and decrypting than in a server-based configuration. It requires trusting the router vendors' IPsec implementation and operating system. Also, although this provides LAN-to-LAN security, it is theoretically inferior to end-to-end connection security. However, a router-based solution would be transparent to servers, potentially decreasing complexity.

Method

We have written a generic Puppet module for Strongswan, and a role class with site-specific configuration details. To avoid the complexity of SSL key generation and distribution, we reuse the Puppet client's certificates.

Administration

Emergency shutdown

To disable all IPsec connections on a node, use /usr/local/sbin/ipsec-global down. This command may be executed via Salt, and is a wrapper for /usr/sbin/ipsec which issues a "down" command for each connection configured in /etc/ipsec.conf. By default it will issue "down" commands as blocking operations to verify success, while executing "up" and "status" in non-blocking mode because these commands can take multiple minutes to complete due to timeouts and retries. Explicit blocking and non-blocking modes are available for all actions. See the help output (-h) for details.

Concepts

Terminology

Left: by convention, the local host
Right: by convention, the remote host
IKE: Internet Key Exchange protocol. Builds on ISAKMP. See: RFC 2409
ISAKMP: Internet Security Association and Key Management Protocol. See: RFC 2407
OE: Opportunistic Encryption - How IPsec-enabled hosts might establish SAs with any other capable hosts they encounter without specific configuration, by retrieving the remote host's key from DNS, Kerberos or other OOB method. Implementations are not mature or widespread.
SA: Security Association - an established session between two IPsec endpoints
XFRM ("transform"): The Linux kernel's IP framework for transforming packets (such as encrypting their payloads).

Specifications

IPsec is defined in a series of RFCs:

https://en.wikipedia.org/wiki/IPsec#Standards_Track

Further RFCs provide more info:

https://en.wikipedia.org/wiki/IPsec#Informational_RFCs

IPsec was originally mandatory for IPv6, but has been downgraded from MUST to SHOULD to make implementation easier for embedded devices:

Transport vs tunnel mode

IPsec has two modes of operation, tunnel and transport.

In tunnel mode, the entire IP packet is encrypted and encapsulated as the payload of a new, larger packet (possibly causing MTU problems)
In transport mode, the original IP packet header is maintained and only the payload is encrypted

Host-host vs site-site

IPsec may be implemented as a host-to-host connection (end-to-end), or between specific segments (router-to-router aka site-to-site). Either scenario may be accomplished using tunnel or transport mode.

Host-to-site connections, such as a laptop connecting remotely to a corporate network bastion, is commonly referred to as a "roadwarrior" configuration.

Host to host connection in transport mode is described here: http://www.strongswan.org/uml/testresults/ikev2/host2host-transport/index.html

Connection initiation

In order for two hosts to communicate using IPsec, appropriate Security Associations must first be configured. This happens in two phases.

Phase 1: Main or Aggressive Mode - Negotiate how IKE should be protected
1. Agree upon algorithms & hashes to secure IKE communications
2. Mutual proof of identity using a Diffie-Hellman exchange to generate a shared secret session key and pass nonces
3. Verify the other side's identity and establish the "main" SA (IKE SA)
- Main Mode vs Aggressive mode:
  - Main mode takes six packets (request & response for each of the three steps above), and is secure against sniffing.
  - Aggressive mode uses only three packets: the first one concatenates the proposed IKE SA values, the DH public key, a nonce, and the sender's identity into a single request packet. The recipient sends back everything needed to complete the exchange, and finally the initiator confirms success. This is faster, but information has been exchanged before there is a secure channel. This means that sniffed traffic can reveal who formed the new SA.

Phase 2: Quick Mode - Negotiate how IPsec traffic should be protected
- Phase 2 has only one mode
1. Negotiate "child" SA (IPsec SA) parameters, protected by the IKE SA established in Phase 1
2. Establish IPsec Security Associations
3. Periodically renegotiate IPsec SAs

IPv4 vs IPv6

Applications make their own decisions about whether to use v4 or v6 transport, based on configuration (hostname vs IP, etc.) as well as implementation of v4/v6 selection algorithm. RFC 6555 ("Happy Eyeballs") specifies that applications should simultaneously attempt v4 + v6 and select whichever connection ACKs first, to avoid timeouts waiting for v6 before v4 replies. For these reasons, in general the best practice seems to be to configure both v4 and v6 SAs, so that traffic will be encrypted regardless of which transport an application uses.

In the case of Strongswan, this means specifying left/right pairs by IP rather than hostname.

Cipher selection

For IPsec security associations, four cryptographic algorithms are negotiated:

Encryption algorithm
Identity (authentication) algorithm
Pseudorandom Function (PRF)
Key exchange algorithm

For lists of algorithms in each category which MUST and SHOULD be implemented, see RFC 7321. For all supported transform type values, see IANA.

Encryption

Some algorithms implement a "combined mode" for integrated authentication and encryption. RFC 7321 says "This document encourages the use of authenticated encryption algorithms because they can provide significant efficiency and throughput advantages, and the tight binding between authentication and encryption can be a security advantage"

RFC 5116 defines combined mode algorithms based on AES-GCM and AES-CCM. RFC 7321 states that "AES-GCM RFC 4106 brings significant performance benefits, has been incorporated into IPsec recommendations RFC 6379, and has emerged as the preferred authenticated encryption method in IPsec and other standards." GCM is also considered better suited to parallelized computation, and Intel's hardware acceleration found in the AES-NI extension includes optimizations for GCM.

Selected: 128 bit AES-GCM with 128 bit (16 byte) Integrity Check Value

Identity

When using an authenticated encryption algorithm, a discrete integrity protection algorithm is not needed. If specified in Strongswan's config, this setting is ignored. However, setting this value to 'null' is incorrect as it will be interpreted as a selector for the Null encryption cipher. Therefor the proper configuration is simply to not specify any integrity algorithm.

Selected: (none)

Pseudorandom function

IKEv2 supports multiple pseudorandom function (PRF) algorithms. Strongswan supports PRFs based on MD5, SHA-1, SHA-2, and AES. Due to published attacks, we will select between the AES and SHA-2 variants. The AES-based PRF, AES_XCBC, has not proved popular with implementors. RFC 7321 downgrades it from SHOULD+ to SHOULD, hence we select the PRF basd on SHA-2. SHA_224 and SHA_256 use 32-bit words, while SHA_384 and SHA_512 use 64-bit words.

Selected: SHA2_384 PRF

Key exchange

The Diffie-Hellman key exchange algorithm is defined in several "groups", some of which use eliptic curve cryptography. Currently, ECDH is favored but there is some distrust of the curves defined by NIST. Alternative are curves defined by the European Brainpool working group. Federal encryption standards allow encryption up to "TOP SECRET" level using 384-bit keys.

Selected: Brainpool Elliptic Curve Group 29 (384-bit key)

Implementations

Linux

Linux history of IPsec is convoluted due to forks and parallel development

The first IPsec implementations for Linux were Kame and Freeswan (FreeS/WAN), and they were incompatible.
Kame project:
- Drivers were merged into the kernel
- Userspace tools are called `ipsec-tools`
- IKE daemon is called 'Racoon'
Freeswan project:
- Was more popular
- Required its own out-of-tree kernel module
- Later it gained the ability to use Kame's kernel drivers
Kame project is no longer active
Freeswan project is no longer active
Strongswan and Openswan are forks from Freeswan
Libreswan is a fork of Openswan
Current popular choices for IKE are Strongswan and Libreswan
- These still uses Kame drivers, no need to compile kernel modules

Kernel configuration

Parameters

FIXME: /proc/sys/net/core/xfrm_larval_drop

It seems like this setting tells the kernel to drop packets while IPsec is coming up rather than freezing them, which can make apps behave badly. Need to look up proper documentation; all I've found so far is old mailing list chatter like this: https://marc.info/?l=bind-users&m=120910796110563&w=2

FIXME: search for other relevant sysctls

Status

ip -s xfrm state
ip -s xfrm policy

Strongswan

https://wiki.strongswan.org/projects/strongswan
https://www.strongswan.org/documentation.html
https://wiki.strongswan.org/projects/strongswan/wiki/UserDocumentation
https://wiki.strongswan.org/projects/strongswan/wiki/IntroductionTostrongSwan
- "Strongswan is basically a keying daemon which uses IKE to establish SAs between peers. ... The actual IPsec traffic is not handled by Strongswan."
https://wiki.strongswan.org/projects/strongswan/wiki/IKEv2Examples
http://www.strongswan.org/uml/testresults/ikev2/host2host-transport/index.html

Commands

Service control

Ubuntu 12.04 Precise: SysV init: /etc/init.d/ipsec
Ubuntu 14.04 Trusty: Upstart: /etc/init/strongswan
Debian 8 Jessie: Provides both /etc/init.d/ipsec and /lib/systemd/system/strongswan.service
- This confuses the Puppet Service provider so we had to dpkg-divert --divert /etc/init.d/ipsec-disabled /etc/init.d/ipsec
- Systemd gets confused when ipsec restart is run. This command should never be issued on a systemd-based system! Instead, use the proper service ipsec restart

Status

ipsec status
ipsec statusall
ipsec listcerts
ipsec listcacerts

Verifying operation

To test: start a ping, and tcpdump:

alice: ping bob.eqiad.wmnet
bob: tcpdump -i eth0 host alice.esams.wmnet

Failure: packets are not encrypted so tcpdump can identify the payload as ICMP:

 17:17:39.937199 IP alice.esams.wmnet > bob.eqiad.wmnet: ICMP echo request, id 23424, seq 1, length 64
 17:17:39.937219 IP bob.eqiad.wmnet > alice.esams.wmnet: ICMP echo reply, id 23424, seq 1, length 64

Success: all tcpdump can see is an Encapsulated Security Payload

 17:37:42.229591 IP alice.esams.wmnet > bob.eqiad.wmnet: ESP(spi=0xceb269f1,seq=0x1), length 116
 17:37:42.229653 IP bob.eqiad.wmnet > alice.esams.wmnet: ESP(spi=0xc9cf28cd,seq=0x1), length 116

Logging

APP: appears in logs but is not a valid entry for charondebug config.
- presence makes the daemon silently not start
ASN: low level encoding/decoding (ASN.1, X.509, etc.)
- boring
CFG: configuration management and plugins
- shows details of config file parsing including assumed defaults for undeclared values.
- somewhat interesting
- probably only seen during startup?
CHD: CHILD_SA / IPsec SA
- associates src/dest with "SPI" hex value
- boring
DMN: main daemon setup/cleanup/signal handling
- only shutdown and startup messages
- these are useful markers
ENC: Packet encoding/decoding encryption/decryption operations
- interesting only at level 1
ESP: libipsec library messages
- no logging even at level 2
- package strongswan-plugin-kernel-libipsec not installed
IKE: IKE_SA/ISAKMP SA
- note similarity to CHD
- task activation: CHILD_CREATE, IKE_INIT, etc
- authentication via RSA sig success
- CHILD_SA establish/close/delete
- reauthentication
- send/receive delete messages
- possibly interesting but let's see if we can get the same utility with less verbosity at level 1
- initial config i copied set this to 2
IMC: Integrity Measurement Collector
- nothing logged even at level 2
IMV: Integrity Measurement Verifier
- nothing logged even at level 2
JOB: Jobs queuing/processing and thread pool management
- only this at level 1:
- [JOB] spawning 16 worker threads
- boring
KNL: IPsec/Networking kernel interface
- boring
LIB: libstrongwan library messages
- details plugin loading and unmet deps
- otherwise boring
MGR: IKE_SA manager, handling synchronization for IKE_SA access
- check-in and check-out messages.
- boring.
NET: IKE network communication
- "received packet", "waiting for data on sockets".
- boring.
PTS: Platform Trust Service
- no logging even at level 2
TLS: libtls library messages
- no logging even at level 2
TNC: Trusted Network Connect
- no logging even at level 2

Recommended setting

 charondebug="cfg 2, dmn 2"

Related info

actual ipsec traffic does not pass through charon and hence the above will not influence its logging
notes on reducing logging at compile time for perf: https://wiki.strongswan.org/projects/strongswan/wiki/LoggerConfiguration
what do we need to know from logging?
- do the transports continue to function in the face of packet loss or network interruption?
- for this reason i may need to firewall non-ipsec traffic between my test hosts so that failure is evident
- though my understanding of how SAs are stored in-kernel makes it seem unlikely that unencrypted traffic would even be attempted unless the SAs are removed
- assumption: SAs will only be removed/modified by charon

Monitoring

Availability

Existing Nagios/Icinga plugins either required separate configurations for each SA, or simply counted them to determine health. The former seemed too burdensome and the latter seemed insufficent, so we wrote our own which parses ipsec statusall output to identify state for each configured connection: established, connecting, or not connected.

FIXME: link

Performance

Data sources it would be useful to collect via Carbon/Graphite:

Request and response counters for IKE init and auth events, from ipsec listcounters
Counters for bytes and packets plus replays and failures, from ip -s xfrm state
- The related command ip -s xfrm policy duplicates most of the above output + adds additional info, 'state' appears superior.
Thread and memory stats from the Status section of ipsec statusall:
- malloc: sbrk 2568192, mmap 0, used 454432, free 2113760
- worker threads: 11 of 16 idle, 5/0/0/0 working, job queue: 0/0/0/0, scheduled: 6

Configuration

https://wiki.strongswan.org/projects/strongswan/wiki/IpsecConf

Setup Section

https://wiki.strongswan.org/projects/strongswan/wiki/ConfigSetupSection
strictcrl: do we require a fresh cert revocation list? default no.
- i guess we want to enable this unless it causes problems
cachecrls: default no, i guess that's the most secure unless it causes perf problems. mentions that they may be fetched via http or ldap.
uniqueids: confusing. seems like we want the default: yes
charondebug: annoyingly terse.
https://wiki.strongswan.org/projects/strongswan/wiki/LoggerConfiguration
- a howto suggested: charondebug="cfg 2, dmn 2, ike 2, net 2"

Connection Section

https://wiki.strongswan.org/projects/strongswan/wiki/ConnSection
auto: i see no advantage to any other setting besides 'start'. 'route' would enable on demand, but yeah no advantage unless memory usage or rekey traffic becomes a concern. if they are, i'd argue that the software sucks or we're Doing It Wrong.
closeaction: what to do if remote peer unexpectedly closes CHILD_SA. doesn't describe what 'hold' does. mentions interaction with uniqueids checking, which i guess is the reason it defaults to 'none'.
compress: obviously gonna cost some cpu but i wonder how much it would affect intercontinental traffic, and especially whether it's worth it for short http blips
dpdaction: hmm perhaps we want 'restart' instead of 'none'?
inactivity: no default?
esp: default cipher suite list looks ok but i should check with csteipp and giuseppe before going to prod.
forceencaps:
- hmm, ESP is incompat with NAT because it encrypts the TCP/UDP header which NAT wants to modify.
solution: wrap the ESP in UDP.
http://www.watchguard.com/training/vbasics5/WG_VPN/vpn21.htm
https://www.rfc-editor.org/rfc/rfc3948 + 3947
- so this is different than tunnel mode because it's just wrapping ESP in UDP instead of encapsulating the entire original packet.
https://en.wikipedia.org/wiki/NAT_traversal#IPsec_traversal_across_NAT
- mentions that NAT-T may be used to achieve OE, but i guess that wouldn't apply to our situation anyway unless we want to start speaking OE with users' machines
ike: another cipher suite declaration
keyingretries: interesting that the testsuite set it to 1. i guess for fast fail during testing. default is 3 but maybe we want '%forever'
keylife/lifetime: same as above, testsuite set to 20m instead of default 1h.
margintime/marginbytes/marginpackets: ah this isn't what i was expecting: it attempts to re-key after this interval, default 9m. lifetime above is max allowable.
mark/mark_in/mark_out: i guess this is just to tag connections with a unique id for monitoring. /proc/net/xfrm_stat seems to have disappeared. i see /sys/module/xfrm* but not sure what stats are interesting, if any.
mobike: well we don't need mobile ip so i guess it makes sense to disable it, as the test suite does.
rekeyfuzz: 100% sounds reasonable but we might have to raise this due to many peers
replay_window: not much explanation but sounds like the sort of thing that might need to be increased for high-bdp connections
- what is the netlink backend? what are the alternatives?
- https://wiki.strongswan.org/projects/strongswan/wiki/Kernel-libipsec
- netlink is the default
- there's also pfkey (kernel, status: experimental) and libipsec (all-userspace)
- pf_key is some kind of special socket for ipsec to exchange SADB messages with the kernel. unclear what the advantage is, if any.
tfc: pads ESP to MTU for Traffic Flow Confidentiality. would increase net traffic but i guess it's useful for paranoia

Cipher proposals

See above: #Cipher selection

IKE master SA

Notation: encryption-integrity[-prf]-dhgroup

 ike=aes128gcm16-prfsha384-ecp384bp!

Yields IKE proposal: AES_GCM_16_128/PRF_HMAC_SHA2_384/ECP_384_BP

Use combined mode authenticated encryption AES_GCM_128_16
A separate identity algorithm is not used due to authentication included in AES_GCM
Use SHA2_384's PRF
Use Brainpool curves for ECDH, with a 384-bit key
The strict flag (!, exclamation mark) is used to restrict the daemon to propose and accept only the specified cipher proposal without appending or accepting default ciphers.

ESP child SAs

Notation: encryption-integrity[-dhgroup][-esnmode]

 esp=aes128gcm128-ecp384bp-noesn!

Yields ESP proposal: AES_GCM_16_128/ESN

Use the same authenticated encryption selected for IKE SA
Do not define a separate identity algorithm, identically to the IKE SA
The PRF is not specified for child SAs because it is inherited from the IKE SA
Although the output of 'ipsec statusall' does not confirm use of ecp384bp, this is the only key exchange algorithm accepted due to the strict flag (!)
Extended Sequence Number (ESN: RFC 4304) mode is disabled due to kernel crash as of Linux 3.19
The strict flag (!, exclamation mark) is used to restrict cipher selection in the same way as in the IKE SA

Left/Right

surprise: left/right is mostly convention, it checks both against provisioned interfaces on startup. however if neither matches it assumes left is local.
left/right: can also take value '%any' or a range or subnet
leftca/rightca: i guess this is where i would require wmf ca cert to be in the chain.. if that terminology is correct
leftfirewall/rightfirewall: 'yes' says that the hosts are blocking traffic to remote and the blocks should be removed once the connection is established. in our case blocking unencrypted traffic is not a priority so i don't see a need for this.
leftid/rightid: looks like @fqdn syntax is outdated, it was to avoid resolving to IP. seems like we don't need this field at all since it defaults to left/right.

CA Section

https://wiki.strongswan.org/projects/strongswan/wiki/CaSection
- i don't have one of these
- optional, used to assign special parameters to a CA

Network

Because of a bug or limitation in IPsec and Strongswan, the MTU of all the IPsec links between Varnish servers has been locked to 1450, see T195365 for troubleshooting and implementation.