Managing multiple SSH agents

From Wikitech
Jump to navigation Jump to search

This describes a method for maintaining a separate ssh-agent to hold your ssh key for connecting to Toolforge/CloudVPS.

The problem

You use an ssh-agent to connect to your personal or company systems. You want to connect to Toolforge/CloudVPS using an agent and you created a separate ssh key to connect to Toolforge/CloudVPS, but you don't want to forward your personal key to Toolforge/CloudVPS systems. If you just add both keys to your existing agent, they both get forwarded to Toolforge/CloudVPS. It's a pain to constantly remove your personal key from your agent each time you want to connect to Toolforge/CloudVPS. Additionally, you might be connected to both your personal system and Toolforge/CloudVPS simultaneously, so just removing the key is insufficent; you must run a separate ssh-agent. You don't want to run one agent per connection because then you have to type your passphrase on every connection (and you have a nice long secure passphrase on your key).

This page describes a method for getting your shell to maintain two agents, your primary agent and your Toolforge/CloudVPS agent. When you connect to Toolforge/CloudVPS you connect to the existing Toolforge/CloudVPS agent (or create one if it doesn't exist) and the rest of the time you use your default agent.

OS X solution

Using multiple agents via launchd (better)

This has been tested on Mac OS X El Capitan. It should work on older releases, please update this text if it works with later versions of OSX.

You can start multiple ssh-agents through launchd user LaunchAgents.

To make this work write the following plist to ~/Library/LaunchAgents/org.wmflabs.ssh-agent.plist

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "">
<plist version="1.0">

Then load the agent: launchctl load ~/Library/LaunchAgents/org.wmflabs.ssh-agent.plist and if you want, start it launchctl start org.wmflabs.ssh-agent.

This will start an ssh agent instance every time you login that will be reachable at /private/tmp/.ssh-agent-cloud.

Repeat the process for every domain you're connecting to.

You can then proceed as suggested in the Linux section below in order to configure ssh. Please note that openssh 7.3 is only available via homebrew at the time of writing. However, do NOT use homebrew's ssh-agent in the launch agent as it's not interacting well with launchd.

Run one agent per terminal

The default terminal application can be modified in how it runs to have every tab run its own ssh-agent.

  • Open Terminal
  • Open Terminal preferences
  • Select the terminal style and settings on the left for the one in use (usually already defaulted to this choice)
  • Select the 'Shell' Tab.
  • Check the box to run the command on startup, and populate the command eval `ssh-agent` in the field.
  • Also ensure 'Run in Shell' is checked.
  • New tabs you open will now use this setting.

Linux solutions

Using multiple agents via systemd

This requires the use of a Linux distribution using systemd as the init system (all current releases do that, e.g. Debian jessie or Ubuntu 15.10 and later).

You can start multiple ssh-agents through systemd user units. The following unit would e.g. connect to Toolforge/CloudVPS, copy it to /etc/systemd/user/ssh-cloud.service (and similar to wherever you else want to connect):

Description=SSH authentication agent for Toolforge/CloudVPS

ExecStart=/usr/bin/ssh-agent -a $SSH_AUTH_SOCK


Then run the following command as your regular user (and similar for the other agent(s)):

systemctl --user enable ssh-cloud

This will create the agent socket ssh-cloud.socket inside the $XDG_RUNTIME_DIR directory (which is automatically created and usually refers to /run/user/1000/, so the effective SSH agent socket would be /run/user/1000/ssh-cloud.socket).

Start the agent as follows to check if the systemd user unit works properly. There is no need to do this afterwards, later on the unit will be started during your first login.

systemctl --user start ssh-cloud.service

Finally whenever you want to connect to either Toolforge/CloudVPS or production via SSH, you need to point your SSH client to the respective agent socket:

If you're using openssh 7.3 (available in Debian unstable since 7th August 2016), this is really simple: You can use the new IdentityAgent directive, so wherever you configure the IdentityFile, simply add the respective SSH agent socket created by the systemd user units above. Here's an example for configuring access for Toolforge/CloudVPS:

 Host *.wmflabs *
      User foo
      IdentityFile /home/foo/.ssh/id_cloud
      IdentityAgent /run/user/1000/ssh-cloud.socket
      IdentitiesOnly yes
      ForwardAgent no

If you don't have openssh 7.3 yet, you need to set the set the environment variable SSH_AUTH_SOCK to the respective socket before connecting, e.g.

 export SSH_AUTH_SOCK="/run/user/1000/ssh-cloud.socket"

The simplest solution

There is an easy answer to this problem, though it's not very flexible. Run two terminals on your workstation. Load a fresh agent in one of them. Always use one to connect to Toolforge/CloudVPS and the other to connect other places.

A more complex solution

The items listed here are entirely untested by current staff, and left over from the past.

This solution has the advantage of being able to connect to Toolforge/CloudVPS or other hosts indiscriminately from any terminal running on your workstation (or in screen) etc. It protects you against accidentally attempting to authenticate against Toolforge/CloudVPS with the wrong key.


This solution assumes you are running bash as your local shell. It can probably be adapted for other shells with minimal effort. It involves creating a socket connected to your ssh-agent at a predictable location and using a bash function to change your environment to use the Toolforge/CloudVPS agent when connecting to Toolforge/CloudVPS.

This solution is also geared towards running screen. It's a little more complicated than necessary because when disconnecting then reconnecting to a screen session, the SSH_AUTH_SOCK has usually changed. We override that with a predictable location so that as the agent moves around the old screen sessions still have access to the current agent.

We start by creating a socket that can talk to our regular agent at a predictable location every time we start a new shell. In .bashrc:

 if [ -f ~/.persistent_agent ]; then source ~/.persistent_agent; fi
 persistent_agent /tmp/$USER-ssh-agent/valid-agent

Next we set up a function specifically for connecting to Toolforge/CloudVPS

 # ssh into Toolforge/CloudVPS with an isolated agent
 function cloud() {
   persistent_agent /tmp/$USER-ssh-agent/cloud-agent
   # add the key if necessary
   if ! ssh-add -l | grep -q cloud-key-rsa; then
       ssh-add ~/.ssh/cloud-key-rsa
   ssh -A -D 8080

And one to copy content into Toolforge/CloudVPS (scp into Toolforge/CloudVPS)

 # scp into Toolforge/CloudVPS with an isolated agent
 function cloudcp() {
   persistent_agent /tmp/$USER-ssh-agent/cloud-agent
   # add the key if necessary
   if ! ssh-add -l | grep -q cloud-key-rsa; then
       ssh-add ~/.ssh/cloud-key-rsa
   scp "$@"

Last, we make sure we clean up our old agents if we completely disconnect from the system otherwise we'll wind up with the agent running even when we're not connected to Toolforge/CloudVPS. This is a little tricky because we don't want to kill the agent when we close the first connection we made to Toolforge/CloudVPS but only when we're actually done working. As a proxy for 'done working', I use 'I log out of the last shell i have open on this system'. This is not a great solution because if the connection dies or I just quit Terminal or something like that instead of specifically logging out, .bash_logout doesn't get run. Add to .bash_logout:

 # if this is the last copy of my shell exiting the host and there are any agents running, kill them.
 if [ $(w | grep $USER | wc -l) -eq 1 ]; then
   pkill ssh-agent

Just for good measure, let's throw a line in my user crontab that will kill any agents running if I'm not logged in:

 # if I'm not logged in, kill any of my running ssh-agents.
 * * * * * if ! /usr/bin/w | /bin/grep ben ; then /usr/bin/pkill ssh-agent; fi > /dev/null 2>&1

Finally, here is the code for the persistent_agent function

 ## preconditions and effects:
 ## $validagent already exists and works, in which case we do nothing
 ## SSH_AUTH_SOCK contains a valid running agent, in which case we update $validagent to use that socket
 ## SSH_AUTH_SOCK is empty, in which case we start a new agent and point $validagent at that.
 ## SSH_AUTH_SOCK exists but doesn't actually connect to an agent and there's no existing validagent; we'll start a new one.
 ## end result:
 ## validagent always points to a running agent, either local or your existing forwarded agent
 function persistent_agent() {
   validagentdir=$(dirname ${validagent})
   # if it's not a directory or it doesn't exist, make it.
   if [ ! -d ${validagentdir} ]
       # just in case it's a file
       rm -f ${validagentdir}
       mkdir -p ${validagentdir}
       chmod 700 ${validagentdir}
   # only proceed if it's owned by me
   if [ -O ${validagentdir} ]
       # update the timestamp on the directory to make sure tmpreaper doesn't delete it
       touch ${validagentdir}
       # if the validagent arleady works, we're done
       if ssh-add -l > /dev/null 2>&1; then
       # ok, the validagent doesn't arleady work, let's move on towards setting it up.
       # if SSH_AUTH_SOCK is a valid agent, we'll use it.
       if ssh-add -l > /dev/null 2>&1; then
           ln -svf $SSH_AUTH_SOCK $validagent
       # note - inverting the order of the previous two tests changes behavior from 'first valid agent gets $validagent' to 'most recent valid agent gets $validagent'.
       # ok, at this point SSH_AUTH_SOCK doesn't point to a valid agent (it might be empty or have bad contents)
       # let's just start up a new agent and use that.
       echo "triggering new agent"
       eval $(ssh-agent)
       ln -svf $SSH_AUTH_SOCK $validagent
   # at this point, I failed to own my $validagentdir.  Someone's trying to do something nasty?  Who knows.
   # I've failed to create a validagent.  Announce that and bail.
   echo "Failed to create a valid agent - bad ownership of ${validagentdir}"

Note that I already have my regular key loaded:

 ben@green:~$ ssh-add -l
 2048 25:9e:91:d5:2f:be:73:e8:ff:37:63:ae:83:5b:33:e1 /Users/ben/.ssh/id_rsa (RSA)

The first time (in a given day) you connect to Toolforge/CloudVPS, you are prompted to enter the passphrase for your key, and when you get to bastion, it can only see your Toolforge/CloudVPS key:

 ben@green:~$ cloud
 triggering new agent
 Agent pid 32638
 `/tmp/ben-ssh-agent/cloud-agent' -> `/tmp/ssh-YfZWc32637/agent.32637'
 Enter passphrase for /home/ben/.ssh/cloud-key: 
 Identity added: /home/ben/.ssh/cloud-key (/home/ben/.ssh/cloud-key)
 [motd exerpted]
 ben@bastion:~$ ssh-add -l
 2048 60:a2:b5:a5:fe:47:07:d6:d5:78:50:50:ba:50:14:46 /home/ben/.ssh/cloud-key (RSA)

When connecting the subsequent shells (until the end of the day when you log out of your workstation and all your agents are killed), you are connected without being prompted for your passphrase.

 ben@green:~$ cloud
 [motd exerpted]

Copying files means just using cloudcp instead of scp:

 ben@green:~$ cloudcp foo
 foo                                    100%   43KB  43.0KB/s   00:00

But when you log out of bastion (in any connection), your normal key is once again available for connecting to personal or other hosts:

 ben@bastion:~$ logout
 Connection to closed.
 ben@green:~$ ssh-add -l
 2048 25:9e:91:d5:2f:be:73:e8:ff:37:63:ae:83:5b:33:e1 /Users/ben/.ssh/id_rsa (RSA)