Production shell access

From Wikitech
(Redirected from Requesting shell access)
Jump to navigation Jump to search

For instructions on accessing public Cloud Services servers, see Help:Access.

This page explains how to access the production Wikimedia cluster.

Remember that production access is extremely sensitive! Take it seriously and immediately contact Tech Ops if you make a mistake or something goes wrong.

Read and remember the server access responsibilities, including the overall philosophy:

  • The Wikimedia Operations Team will do whatever is necessary to keep all machines and services working and running in a secure fashion.
  • Don't by any wilful, deliberate, reckless or unlawful act interfere with the work of another developer or jeopardize the integrity of data networks, computing equipment, systems programs, or other stored information.
  • Don't use Wikimedia facilities for private purposes, including consultancy or any other work outside the scope of official duties or functions for the time being, without specific authorization to do so.
  • This is not your personal machine. Many things that are fine to do on your personal machine are not okay in a production environment
  • When in doubt ask questions first, act second. In this case forgiveness is much more difficult to get than permission.

Requesting access

Shells!

Production shell access is granted strictly on an as needed basis, and is entirely under the purview of the Engineering Department and Operations Department managers. They can approve or deny access for any reason, as security is of the highest priority.

To acquire shell access, you have projects or responsibilities that requires this access on a regular and ongoing basis. Requests based on a one-time need will not be granted. If you have a one-time need for data, request the data instead.

Prerequisites

There are some things you'll need before you start the process.

New users

  1. Read and sign the Acknowledgement of Wikimedia Server Access Responsibilities.
  2. Use this form to create a ticket requesting access.[1]
  3. In the title, replace "RESOURCE" and "USER" with your name and the resource you need access to. (For new user requests, make a separate ticket for each user.)
  4. Add the following information to the description:
    • Your full name
    • Your developer access username (that is, the one you use for Cloud VPS SSH, not Wikitech login. Wikitech shows this as "instance shell account name" in preferences). We will use this as your production shell username.
    • The public key from your SSH keypair.[2] This must not be the same one you use to access Cloud VPS.
    • A detailed reason for your request. In particular, describe which specific servers you need access to and why. We err on the side of giving fewer permissions rather than more, so the more detailed your request, the more likely you are to get all the permissions you need.
  5. Get approvals from the following people. These approvals should be as comments to the Phabricator task, so these people will need a Phabricator account as well. The comments should be made directly through the web interface, not via email.[3]
    • At least one comment of support from a Wikimedia Foundation employee, explaining why it is a good idea to accept your request. The comment of support should be from your supervisor if you're an employee, or from the employee you will be collaborating with if you're not.
    • The project lead where your access will be granted.
  6. For most requests, a three business day waiting period must be observed after the request is filed.[4]
  7. When your request is approved, you will be asked to provide your full legal name, preferred email address for contact, and physical address to the Wikimedia Foundation Legal Team (or your employee contact may forward this information on your behalf). This information will be used to customize a non-disclosure agreement, which you will be asked to read, comprehend, and electrically sign through the Foundation's contract management system. The agreement will be similar to the Volunteer NDA.
  8. The Wikimedia Foundation employee that will be supervising your work will coordinate final sign off by a C-level staff of the Wikimedia Foundation when all other criteria have been met before your access is granted.

If you feel an unreasonable amount of time has passed, you can comment on the ticket to request update and/or request an update directly from the Operations team member on Ops Clinic Duty that week.

Additional permissions for existing users

To escalate shell access, you should be working on a project that requires this access on a regular and ongoing basis. Any one time requests should simply request what data is required, access is not granted for one time requests.

  1. Read and sign the Acknowledgement of Wikimedia Server Access Responsibilities, if you haven't already.
  2. Use this form to create a ticket requesting access.[1]
  3. In the title, replace "RESOURCE" and "USER" with your name and the resource you need access to. (Group tickets are acceptable when a group is being escalated.)
  4. Add the following information to the description.[5]
    1. Your full name
    2. Your shell username
    3. A detailed reason for your request. In particular, describe which specific servers you need access to and why. We err on the side of giving fewer permissions rather than more, so the more detailed your request, the more likely you are to get all the permissions you need.
  5. A three business day waiting period must be observed after the request is filed.[4]
    • This may not be required when the change is correcting a previous request, but should be followed for escalations that include not previously approved permissions. It may not be required in some other circumstances.

Technical details

Production shell users, their keys, and their permissions are managed in modules/admin/data/data.yaml in the operations-puppet repository.

Setting up your access

Generating your SSH key

First, you'll have to generate an SSH keypair. GitHub has a good help page (note that you can switch between Mac, Windows, and Linux documentation right under the title).

We recommend that you use an ED25519 key (or, alternatively, a 4096-bit RSA key). Do not use DSA keys as they are insecure.

To generate an ED25519 key, run the following command in your terminal:

ssh-keygen -t ed25519

To generate an RSA key, run the following command in your terminal:

ssh-keygen -t rsa -b 4096 -o

Some systems don't support the newer -o option which saves private keys in a slightly more secure format (OpenSSH rather than PEM), but those should be fairly rare, it was introduced in 6.5

Once your new SSH key is set up, follow the instructions above to submit your access request. Remember: the key you use for production access must be different from the key you use for Cloud VPS (i.e. do NOT paste it into the Openstack field under Special:Preferences on this wiki).

Setting up your SSH config

The standard configuration (people not having root access), is to have the ssh connection to be established on a bastion and proxy the command to the target host inside the cluster. To do this, add the following to your SSH config file (usually located at ~/.ssh/config).

Host bast1002.wikimedia.org
    # Direct connection for the bastion host
    ProxyCommand none
    ControlMaster auto

Host *.wikimedia.org *.wmnet !gerrit.wikimedia.org !git-ssh.wikimedia.org
    User your_username_here
    # Everything else goes via bastion acting as a proxy
    ProxyCommand ssh -a -W %h:%p bast1002.wikimedia.org
    # Do not offer other identities loaded in ssh-agent
    IdentitiesOnly yes
    IdentityFile ~/.ssh/your_production_ssh_key

Trouble shooting: If ssh first hangs for a few minutes and then produces a lot of "Connection closed by remote host" messages, you probably created an infinite proxy loop, with connections to bast1002 trying to proxy via bast1002. This may happen if the proxy host name in the second entry does not match the host name the first entry matches. It will also happen if you put these entries in the wrong order - ssh will use the first config entry that matches, and will ignore any further matching entries! Since bast1002.wikimedia.org matches *.wikimedia.org, this will apply the proxy command to the connection to bast1002, creating an infinite loop.

In the example above you may replace bast1002.wikimedia.org with the bastion that is physically closest to you:

Advanced: operations config

If you will be setting up new servers or doing other administration work, you can use the below advanced configuration instead. Otherwise, skip this section. If you're not sure, you almost certainly don't need this!

## Production & External Zones
Host iron.wikimedia.org bast1002.wikimedia.org bast2001.wikimedia.org bast3002.wikimedia.org bast4002.wikimedia.org bast5001.wikimedia.org bastion-restricted.wmflabs.org
    StrictHostKeyChecking yes
    ProxyCommand none
    ControlMaster auto
    IdentitiesOnly yes

Host *.wikimedia.org !gerrit.wikimedia.org !git-ssh.wikimedia.org
    User your_username_here
    StrictHostKeyChecking yes
    IdentitiesOnly yes
    IdentityFile ~/.ssh/your_production_ssh_key
    UserKnownHostsFile ~/.ssh/known_hosts.d/wmf-prod
    ProxyCommand ssh -a -W %h:%p bast1002.wikimedia.org

## Internal Zones
Host *.mgmt.eqiad.wmnet *.mgmt.codfw.wmnet *.mgmt.ulsfo.wmnet *.mgmt.esams.wmnet *.mgmt.eqsin.wmnet
    User root
    StrictHostKeyChecking no

Host *.wmnet
    User your_username_here
    StrictHostKeyChecking yes
    IdentitiesOnly yes
    IdentityFile ~/.ssh/your_production_ssh_key
    UserKnownHostsFile ~/.ssh/known_hosts.d/wmf-prod

Host *.eqiad.wmnet
    ProxyCommand ssh -a -W %h:%p bast1002.wikimedia.org

Host *.codfw.wmnet
    ProxyCommand ssh -a -W %h:%p bast2001.wikimedia.org

Host *.esams.wmnet
    ProxyCommand ssh -a -W %h:%p bast3002.wikimedia.org

Host *.ulsfo.wmnet
    ProxyCommand ssh -a -W %h:%p bast4002.wikimedia.org

Host *.eqsin.wmnet
    ProxyCommand ssh -a -W %h:%p bast5001.wikimedia.org

## Networking Equipment
Host *-eqiad.wikimedia.org *-eqord.wikimedia.org
    ProxyCommand ssh -a -W %h:%p bast1002.wikimedia.org

Host *-codfw.wikimedia.org *-eqdfw.wikimedia.org
    ProxyCommand ssh -a -W %h:%p bast2001.wikimedia.org

Host *-esams.wikimedia.org *-knams.wikimedia.org
    ProxyCommand ssh -a -W %h:%p bast3002.wikimedia.org

Host *-ulsfo.wikimedia.org
    ProxyCommand ssh -a -W %h:%p bast4002.wikimedia.org

Host *-eqsin.wikimedia.org
    ProxyCommand ssh -a -W %h:%p bast5001.wikimedia.org

## DEV
Host gerrit.wikimedia.org
    User your_username_here
    StrictHostKeyChecking yes
    ProxyCommand none
    IdentitiesOnly yes
    IdentityFile ~/.ssh/your_development_ssh_key
    UserKnownHostsFile ~/.ssh/known_hosts.d/wmf-cloud

Host *.wmflabs.org *.wmflabs
    User your_username_here
    IdentityFile ~/.ssh/your_development_ssh_key
    StrictHostKeyChecking no
    UserKnownHostsFile ~/.ssh/known_hosts.d/wmf-cloud
    ProxyCommand ssh -a -W %h:%p bastion-restricted.wmflabs.org

#-----------------------------------------------------

Known host files

To ensure the validity of the hosts to connect to, it's better to enforce the StrictHostKeyChecking and to do that you need to have locally a list of known hosts. In order to generate that list and keep it up to date, there is this script available. Check the instructions in the script's header. If you need any help contact the author.

Before you can use the script, you'll need to bootstrap this setup with at least one bastion host. Disable strict host key checking, ssh to a bastion, and make sure the fingerprint matches what's listed at Help:SSH Fingerprints.

Security

Do not use SSH agent forwarding (the -A command line option). Agent forwarding does not make it possible to steal your key itself, but it does make it possible for someone to hijack your SSH agent and thus your identity, so we do not do it. The -a option (with a lower case "a") disables agent forwarding, and is thus included in the sample configurations below.

This page used to recommend that you add the following lines to protect against an SSH bug from 2016:

Host *
    UseRoaming no

However, we are now using an updated version which removed the vulnerable options, so you will get an error if your config includes the lines above. Just remove them from your config to connect.

Other tips

Debugging

If your production access has been approved but you aren't able to log in, you can ask for help in the Phabricator ticket for your access request. If you got access a long time ago and it's a new problem, you can file a new ticket and tag it with #operations.

Wherever you ask for help, make sure you include your SSH configuration (but not your key itself!) and the output you get when you run your ssh command with the -v option (verbose mode).

See also

Notes

  1. 1.0 1.1 The form automatically adds the ticket to the Ops-Access-Requests project so the Operations team will see your request.
  2. You can also put your public key on your wiki user page, in a Phabricator paste, or in a Gerrit patchset you upload, but you can't include it in an email reply to the task.
  3. This protects against email spoofing.
  4. 4.0 4.1 If you request any level of sudo privileges, your request must have a security review at a weekly operations meetings. Sudo access is granted on an extremely limited basis, and will typically apply to the smallest permissions possible (user/process restricted over all). Expect this process to take at least one business week.
  5. Your manager's approval is usually not required, as you've already been granted access to the cluster; the project lead of the cluster you request access to should sign off (if in doubt, ask the Ops Clinic Duty person for the week.)