From Wikitech
This page contains historical information. It may be outdated or unreliable.



Due to the way HTTPS sessions are terminate, we will use IPsec to encrypt traffic between the caching proxy (Varnish) nodes in cache data centers and their counterparts in our main sites.


  • Q: Does Varnish support HTTPS backends?
  • A: No, and it probably never will.


  • Q: Could Varnish use stunnel or pound to communicate with second-tier Varnishes in main colos?
  • A: It's possible, but not preferred because these solutions operate in userspace and their scalability and resilience are not as trusted as IPsec which is implemented in the Linux kernel.


  • Q: What about using a hardware solution in the routers?
  • A: This is seen as an expensive solution which also may be less scalable because fewer devices are encrypting and decrypting than in a server-based configuration. It requires trusting the router vendors' IPsec implementation and operating system. Also, although this provides LAN-to-LAN security, it is theoretically inferior to end-to-end connection security. However, a router-based solution would be transparent to servers, potentially decreasing complexity.


We have written a generic Puppet module for Strongswan, and a role class with site-specific configuration details. To avoid the complexity of SSL key generation and distribution, we reuse the Puppet client's certificates.


Emergency shutdown

To disable all IPsec connections on a node, use /usr/local/sbin/ipsec-global down. This command may be executed via Salt, and is a wrapper for /usr/sbin/ipsec which issues a "down" command for each connection configured in /etc/ipsec.conf. By default it will issue "down" commands as blocking operations to verify success, while executing "up" and "status" in non-blocking mode because these commands can take multiple minutes to complete due to timeouts and retries. Explicit blocking and non-blocking modes are available for all actions. See the help output (-h) for details.



  • Left: by convention, the local host
  • Right: by convention, the remote host
  • IKE: Internet Key Exchange protocol. Builds on ISAKMP. See: RFC 2409
  • ISAKMP: Internet Security Association and Key Management Protocol. See: RFC 2407
  • OE: Opportunistic Encryption - How IPsec-enabled hosts might establish SAs with any other capable hosts they encounter without specific configuration, by retrieving the remote host's key from DNS, Kerberos or other OOB method. Implementations are not mature or widespread.
  • SA: Security Association - an established session between two IPsec endpoints
  • XFRM ("transform"): The Linux kernel's IP framework for transforming packets (such as encrypting their payloads).


IPsec is defined in a series of RFCs:

Further RFCs provide more info:

IPsec was originally mandatory for IPv6, but has been downgraded from MUST to SHOULD to make implementation easier for embedded devices:

Transport vs tunnel mode

IPsec has two modes of operation, tunnel and transport.

  • In tunnel mode, the entire IP packet is encrypted and encapsulated as the payload of a new, larger packet (possibly causing MTU problems)
  • In transport mode, the original IP packet header is maintained and only the payload is encrypted

Host-host vs site-site

IPsec may be implemented as a host-to-host connection (end-to-end), or between specific segments (router-to-router aka site-to-site). Either scenario may be accomplished using tunnel or transport mode.

Host-to-site connections, such as a laptop connecting remotely to a corporate network bastion, is commonly referred to as a "roadwarrior" configuration.

Host to host connection in transport mode is described here: http://www.strongswan.org/uml/testresults/ikev2/host2host-transport/index.html

Connection initiation

In order for two hosts to communicate using IPsec, appropriate Security Associations must first be configured. This happens in two phases.

  1. Phase 1: Main or Aggressive Mode - Negotiate how IKE should be protected
    1. Agree upon algorithms & hashes to secure IKE communications
    2. Mutual proof of identity using a Diffie-Hellman exchange to generate a shared secret session key and pass nonces
    3. Verify the other side's identity and establish the "main" SA (IKE SA)
    • Main Mode vs Aggressive mode:
      • Main mode takes six packets (request & response for each of the three steps above), and is secure against sniffing.
      • Aggressive mode uses only three packets: the first one concatenates the proposed IKE SA values, the DH public key, a nonce, and the sender's identity into a single request packet. The recipient sends back everything needed to complete the exchange, and finally the initiator confirms success. This is faster, but information has been exchanged before there is a secure channel. This means that sniffed traffic can reveal who formed the new SA.
  1. Phase 2: Quick Mode - Negotiate how IPsec traffic should be protected
    • Phase 2 has only one mode
    1. Negotiate "child" SA (IPsec SA) parameters, protected by the IKE SA established in Phase 1
    2. Establish IPsec Security Associations
    3. Periodically renegotiate IPsec SAs

IPv4 vs IPv6

Applications make their own decisions about whether to use v4 or v6 transport, based on configuration (hostname vs IP, etc.) as well as implementation of v4/v6 selection algorithm. RFC 6555 ("Happy Eyeballs") specifies that applications should simultaneously attempt v4 + v6 and select whichever connection ACKs first, to avoid timeouts waiting for v6 before v4 replies. For these reasons, in general the best practice seems to be to configure both v4 and v6 SAs, so that traffic will be encrypted regardless of which transport an application uses.

In the case of Strongswan, this means specifying left/right pairs by IP rather than hostname.

Cipher selection

For IPsec security associations, four cryptographic algorithms are negotiated:

  • Encryption algorithm
  • Identity (authentication) algorithm
  • Pseudorandom Function (PRF)
  • Key exchange algorithm

For lists of algorithms in each category which MUST and SHOULD be implemented, see RFC 7321. For all supported transform type values, see IANA.


Some algorithms implement a "combined mode" for integrated authentication and encryption. RFC 7321 says "This document encourages the use of authenticated encryption algorithms because they can provide significant efficiency and throughput advantages, and the tight binding between authentication and encryption can be a security advantage"

RFC 5116 defines combined mode algorithms based on AES-GCM and AES-CCM. RFC 7321 states that "AES-GCM RFC 4106 brings significant performance benefits, has been incorporated into IPsec recommendations RFC 6379, and has emerged as the preferred authenticated encryption method in IPsec and other standards." GCM is also considered better suited to parallelized computation, and Intel's hardware acceleration found in the AES-NI extension includes optimizations for GCM.

Selected: 128 bit AES-GCM with 128 bit (16 byte) Integrity Check Value


When using an authenticated encryption algorithm, a discrete integrity protection algorithm is not needed. If specified in Strongswan's config, this setting is ignored. However, setting this value to 'null' is incorrect as it will be interpreted as a selector for the Null encryption cipher. Therefor the proper configuration is simply to not specify any integrity algorithm.

Selected: (none)

Pseudorandom function

IKEv2 supports multiple pseudorandom function (PRF) algorithms. Strongswan supports PRFs based on MD5, SHA-1, SHA-2, and AES. Due to published attacks, we will select between the AES and SHA-2 variants. The AES-based PRF, AES_XCBC, has not proved popular with implementors. RFC 7321 downgrades it from SHOULD+ to SHOULD, hence we select the PRF basd on SHA-2. SHA_224 and SHA_256 use 32-bit words, while SHA_384 and SHA_512 use 64-bit words.

Selected: SHA2_384 PRF

Key exchange

The Diffie-Hellman key exchange algorithm is defined in several "groups", some of which use eliptic curve cryptography. Currently, ECDH is favored but there is some distrust of the curves defined by NIST. Alternative are curves defined by the European Brainpool working group. Federal encryption standards allow encryption up to "TOP SECRET" level using 384-bit keys.

Selected: Brainpool Elliptic Curve Group 29 (384-bit key)



Linux history of IPsec is convoluted due to forks and parallel development

  • The first IPsec implementations for Linux were Kame and Freeswan (FreeS/WAN), and they were incompatible.
  • Kame project:
    • Drivers were merged into the kernel
    • Userspace tools are called `ipsec-tools`
    • IKE daemon is called 'Racoon'
  • Freeswan project:
    • Was more popular
    • Required its own out-of-tree kernel module
    • Later it gained the ability to use Kame's kernel drivers
  • Kame project is no longer active
  • Freeswan project is no longer active
  • Strongswan and Openswan are forks from Freeswan
  • Libreswan is a fork of Openswan
  • Current popular choices for IKE are Strongswan and Libreswan
    • These still uses Kame drivers, no need to compile kernel modules

Kernel configuration


FIXME: /proc/sys/net/core/xfrm_larval_drop

  • It seems like this setting tells the kernel to drop packets while IPsec is coming up rather than freezing them, which can make apps behave badly. Need to look up proper documentation; all I've found so far is old mailing list chatter like this: https://marc.info/?l=bind-users&m=120910796110563&w=2

FIXME: search for other relevant sysctls


  • ip -s xfrm state
  • ip -s xfrm policy



Service control

  • Ubuntu 12.04 Precise: SysV init: /etc/init.d/ipsec
  • Ubuntu 14.04 Trusty: Upstart: /etc/init/strongswan
  • Debian 8 Jessie: Provides both /etc/init.d/ipsec and /lib/systemd/system/strongswan.service
    • This confuses the Puppet Service provider so we had to dpkg-divert --divert /etc/init.d/ipsec-disabled /etc/init.d/ipsec
    • Systemd gets confused when ipsec restart is run. This command should never be issued on a systemd-based system! Instead, use the proper service ipsec restart


  • ipsec status
  • ipsec statusall
  • ipsec listcerts
  • ipsec listcacerts

Verifying operation

To test: start a ping, and tcpdump:

  • alice: ping bob.eqiad.wmnet
  • bob: tcpdump -i eth0 host alice.esams.wmnet

Failure: packets are not encrypted so tcpdump can identify the payload as ICMP:

 17:17:39.937199 IP alice.esams.wmnet > bob.eqiad.wmnet: ICMP echo request, id 23424, seq 1, length 64
 17:17:39.937219 IP bob.eqiad.wmnet > alice.esams.wmnet: ICMP echo reply, id 23424, seq 1, length 64

Success: all tcpdump can see is an Encapsulated Security Payload

 17:37:42.229591 IP alice.esams.wmnet > bob.eqiad.wmnet: ESP(spi=0xceb269f1,seq=0x1), length 116
 17:37:42.229653 IP bob.eqiad.wmnet > alice.esams.wmnet: ESP(spi=0xc9cf28cd,seq=0x1), length 116


  • APP: appears in logs but is not a valid entry for charondebug config.
    • presence makes the daemon silently not start
  • ASN: low level encoding/decoding (ASN.1, X.509, etc.)
    • boring
  • CFG: configuration management and plugins
    • shows details of config file parsing including assumed defaults for undeclared values.
    • somewhat interesting
    • probably only seen during startup?
  • CHD: CHILD_SA / IPsec SA
    • associates src/dest with "SPI" hex value
    • boring
  • DMN: main daemon setup/cleanup/signal handling
    • only shutdown and startup messages
    • these are useful markers
  • ENC: Packet encoding/decoding encryption/decryption operations
    • interesting only at level 1
  • ESP: libipsec library messages
    • no logging even at level 2
    • package strongswan-plugin-kernel-libipsec not installed
    • note similarity to CHD
    • task activation: CHILD_CREATE, IKE_INIT, etc
    • authentication via RSA sig success
    • CHILD_SA establish/close/delete
    • reauthentication
    • send/receive delete messages
    • possibly interesting but let's see if we can get the same utility with less verbosity at level 1
    • initial config i copied set this to 2
  • IMC: Integrity Measurement Collector
    • nothing logged even at level 2
  • IMV: Integrity Measurement Verifier
    • nothing logged even at level 2
  • JOB: Jobs queuing/processing and thread pool management
    • only this at level 1:
    • [JOB] spawning 16 worker threads
    • boring
  • KNL: IPsec/Networking kernel interface
    • boring
  • LIB: libstrongwan library messages
    • details plugin loading and unmet deps
    • otherwise boring
  • MGR: IKE_SA manager, handling synchronization for IKE_SA access
    • check-in and check-out messages.
    • boring.
  • NET: IKE network communication
    • "received packet", "waiting for data on sockets".
    • boring.
  • PTS: Platform Trust Service
    • no logging even at level 2
  • TLS: libtls library messages
    • no logging even at level 2
  • TNC: Trusted Network Connect
    • no logging even at level 2

Recommended setting

 charondebug="cfg 2, dmn 2"

Related info

  • actual ipsec traffic does not pass through charon and hence the above will not influence its logging
  • notes on reducing logging at compile time for perf: https://wiki.strongswan.org/projects/strongswan/wiki/LoggerConfiguration
  • what do we need to know from logging?
    • do the transports continue to function in the face of packet loss or network interruption?
    • for this reason i may need to firewall non-ipsec traffic between my test hosts so that failure is evident
    • though my understanding of how SAs are stored in-kernel makes it seem unlikely that unencrypted traffic would even be attempted unless the SAs are removed
    • assumption: SAs will only be removed/modified by charon



Existing Nagios/Icinga plugins either required separate configurations for each SA, or simply counted them to determine health. The former seemed too burdensome and the latter seemed insufficent, so we wrote our own which parses ipsec statusall output to identify state for each configured connection: established, connecting, or not connected.

  • FIXME: link


Data sources it would be useful to collect via Carbon/Graphite:

  • Request and response counters for IKE init and auth events, from ipsec listcounters
  • Counters for bytes and packets plus replays and failures, from ip -s xfrm state
    • The related command ip -s xfrm policy duplicates most of the above output + adds additional info, 'state' appears superior.
  • Thread and memory stats from the Status section of ipsec statusall:
    • malloc: sbrk 2568192, mmap 0, used 454432, free 2113760
    • worker threads: 11 of 16 idle, 5/0/0/0 working, job queue: 0/0/0/0, scheduled: 6


Setup Section

  • https://wiki.strongswan.org/projects/strongswan/wiki/ConfigSetupSection
  • strictcrl: do we require a fresh cert revocation list? default no.
    • i guess we want to enable this unless it causes problems
  • cachecrls: default no, i guess that's the most secure unless it causes perf problems. mentions that they may be fetched via http or ldap.
  • uniqueids: confusing. seems like we want the default: yes
  • charondebug: annoyingly terse.
  • https://wiki.strongswan.org/projects/strongswan/wiki/LoggerConfiguration
    • a howto suggested: charondebug="cfg 2, dmn 2, ike 2, net 2"

Connection Section

  • https://wiki.strongswan.org/projects/strongswan/wiki/ConnSection
  • auto: i see no advantage to any other setting besides 'start'. 'route' would enable on demand, but yeah no advantage unless memory usage or rekey traffic becomes a concern. if they are, i'd argue that the software sucks or we're Doing It Wrong.
  • closeaction: what to do if remote peer unexpectedly closes CHILD_SA. doesn't describe what 'hold' does. mentions interaction with uniqueids checking, which i guess is the reason it defaults to 'none'.
  • compress: obviously gonna cost some cpu but i wonder how much it would affect intercontinental traffic, and especially whether it's worth it for short http blips
  • dpdaction: hmm perhaps we want 'restart' instead of 'none'?
  • inactivity: no default?
  • esp: default cipher suite list looks ok but i should check with csteipp and giuseppe before going to prod.
  • forceencaps:
    • hmm, ESP is incompat with NAT because it encrypts the TCP/UDP header which NAT wants to modify.
  • solution: wrap the ESP in UDP.
  • http://www.watchguard.com/training/vbasics5/WG_VPN/vpn21.htm
  • https://www.rfc-editor.org/rfc/rfc3948 + 3947
    • so this is different than tunnel mode because it's just wrapping ESP in UDP instead of encapsulating the entire original packet.
  • https://en.wikipedia.org/wiki/NAT_traversal#IPsec_traversal_across_NAT
    • mentions that NAT-T may be used to achieve OE, but i guess that wouldn't apply to our situation anyway unless we want to start speaking OE with users' machines
  • ike: another cipher suite declaration
  • keyingretries: interesting that the testsuite set it to 1. i guess for fast fail during testing. default is 3 but maybe we want '%forever'
  • keylife/lifetime: same as above, testsuite set to 20m instead of default 1h.
  • margintime/marginbytes/marginpackets: ah this isn't what i was expecting: it attempts to re-key after this interval, default 9m. lifetime above is max allowable.
  • mark/mark_in/mark_out: i guess this is just to tag connections with a unique id for monitoring. /proc/net/xfrm_stat seems to have disappeared. i see /sys/module/xfrm* but not sure what stats are interesting, if any.
  • mobike: well we don't need mobile ip so i guess it makes sense to disable it, as the test suite does.
  • rekeyfuzz: 100% sounds reasonable but we might have to raise this due to many peers
  • replay_window: not much explanation but sounds like the sort of thing that might need to be increased for high-bdp connections
    • what is the netlink backend? what are the alternatives?
    • https://wiki.strongswan.org/projects/strongswan/wiki/Kernel-libipsec
    • netlink is the default
    • there's also pfkey (kernel, status: experimental) and libipsec (all-userspace)
    • pf_key is some kind of special socket for ipsec to exchange SADB messages with the kernel. unclear what the advantage is, if any.
  • tfc: pads ESP to MTU for Traffic Flow Confidentiality. would increase net traffic but i guess it's useful for paranoia
Cipher proposals

See above: #Cipher selection

IKE master SA

Notation: encryption-integrity[-prf]-dhgroup


Yields IKE proposal: AES_GCM_16_128/PRF_HMAC_SHA2_384/ECP_384_BP

  • Use combined mode authenticated encryption AES_GCM_128_16
  • A separate identity algorithm is not used due to authentication included in AES_GCM
  • Use SHA2_384's PRF
  • Use Brainpool curves for ECDH, with a 384-bit key
  • The strict flag (!, exclamation mark) is used to restrict the daemon to propose and accept only the specified cipher proposal without appending or accepting default ciphers.
ESP child SAs

Notation: encryption-integrity[-dhgroup][-esnmode]


Yields ESP proposal: AES_GCM_16_128/ESN

  • Use the same authenticated encryption selected for IKE SA
  • Do not define a separate identity algorithm, identically to the IKE SA
  • The PRF is not specified for child SAs because it is inherited from the IKE SA
  • Although the output of 'ipsec statusall' does not confirm use of ecp384bp, this is the only key exchange algorithm accepted due to the strict flag (!)
  • Extended Sequence Number (ESN: RFC 4304) mode is disabled due to kernel crash as of Linux 3.19
  • The strict flag (!, exclamation mark) is used to restrict cipher selection in the same way as in the IKE SA


  • surprise: left/right is mostly convention, it checks both against provisioned interfaces on startup. however if neither matches it assumes left is local.
  • left/right: can also take value '%any' or a range or subnet
  • leftca/rightca: i guess this is where i would require wmf ca cert to be in the chain.. if that terminology is correct
  • leftfirewall/rightfirewall: 'yes' says that the hosts are blocking traffic to remote and the blocks should be removed once the connection is established. in our case blocking unencrypted traffic is not a priority so i don't see a need for this.
  • leftid/rightid: looks like @fqdn syntax is outdated, it was to avoid resolving to IP. seems like we don't need this field at all since it defaults to left/right.

CA Section


Because of a bug or limitation in IPsec and Strongswan, the MTU of all the IPsec links between Varnish servers has been locked to 1450, see T195365 for troubleshooting and implementation.