Data Platform/Systems/Kerberos/Administration

Kerberos is a computer-network authentication protocol that works on the basis of tickets to allow nodes communicating over a non-secure network to prove their identity to one another in a secure manner. The protocol was named after the character Kerberos (or Cerberus) from Greek mythology, the ferocious three-headed guard dog of Hades.^[1]

We use it mostly for Hadoop, since it is the only reliable way to add proper authentication for users/daemons/etc..

Current setup

We have two hosts:

krb1001.eqiad.wmnet
krb2001.codfw.wmnet

Both hosts are running a kdc daemon, meanwhile only one kadmind and kpropd. In puppet we set up the master and its replica, usually 1001 and 2001 respectively.

Daemons and their roles

kdc (Key Distribution Centre) - a daemon that authenticate a principal returning a TGT (Ticket Granting Ticket) and acting as TGS (Ticket Granting Service)
kadmind - a daemon that offers a cli to modify/read the Kerberos database (add principals, generate keytabs, etc..). Runs only on the master kdc node.
kprop - a cli util that runs periodically on the master node (via systemd timer) propagating changes to the database to the replicas. Runs only on the master kdc node.
kpropd - a daemon running only on all the replicas, able to get updates sent by kprop and apply the to the local database.

Bootstrap a kerberos host

There are two roles defined in puppet for a kerberos node:

1) master

2) replica

The most common operation should be bootstrapping a new replica, but if you are building a cluster from scratch a new master will need to be created as well.

Master node

The master node is the one that holds the authoritative kerberos database containing the credentials for a certain realm. The only realm supported up to now is the WIKIMEDIA one. One you have applied the role::kerberos::kdc to the master node, everything should install/configure nicely but the kerberos daemons shouldn't be able to start correctly due to the absence of a properly bootstrapped database. In order to create a database, execute the following on the master node:

sudo kdb5_util create -s -r WIKIMEDIA

You will have to enter a password. The one of the WIKIMEDIA realm is stored in pwstore.

At this point you should be able to start the kdc and kadmind daemons.

Replica node

The replica node needs to be bootstrapped as the master node, but it also need something more to allow replication with kprop/kpropd to work. Everything should already be deployed by puppet, but there are some manual steps needed:

log on the master node (not the replica, it is important) and create host principals for the replica. For example:

Generate principal and keytab for the replica, and create the keytab in a tmp location

elukey@krb1001:~$ sudo kadmin.local
Authenticating as principal host/admin@WIKIMEDIA with password.
kadmin.local: addprinc -randkey host/krb2001.codfw.wmnet@WIKIMEDIA
kadmin.local: ktadd -k /tmp/krb2001.keytab host/krb2001.codfw.wmnet@WIKIMEDIA

Stop krb5-kdc and krb5-admin-server on the replica host
Copy the /tmp/krb2001.keytab keytab to the replica node, and rename it /etc/krb5.keytab
Restart the daemons

At this point it should be possible to execute the replicate-krb-database.service on the master node to see if database replication works correctly. The service is a very basic script that dumps the current status of the Kerberos database and sends it to each replica listed in hiera via kprop. On Every replica the kpropd daemon should be listening on port 754 to get updates from the master (puppet should take care of setting up kpropd with its related acls).

Good docs to use as reference

https://web.mit.edu/kerberos/krb5-devel/doc/admin/install_kdc.html

Backups

There are two main workflows to save the Kerberos database data:

replicate-krb-database (systemd timer) that periodically dumps the database and replicates to all the replicas via kprop.
backup-kdc-database (systemd timer), running on all nodes, that periodically takes a snapshot of the local database and save it under /srv.

Why do we need two? If the master database gets corrupted/inconsistent and replicated to all the replicas, then we end up having the wrong data everywhere. Having a local list of database snapshots (especially on the master) should help in having one good version of the database to restore.

The master db snapshots are also saved periodically in Bacula.

Handling failures and failover

In the ideal scenario, when nothing is broken or under maintenance, this is the configuration:

krb1001 is the master KDC node. It runs the kdc and the kadmind daemons, together with the replicate-krb-database timer. Any credential (principal, keytab) management (add/remove/etc..) should only be done on this host via the kadmin.local CLI tool. A rsync server allows the copy of new keytabs from puppetserver1001 (in order to add them to the puppet private repo).
krb2001 is one of the KDC node replicas. It runs the kdc daemon, but not any kadmind, and also the kpropd daemon.

As explained in the above sections, any kerberos client has a list of all the kdc daemons available and can failover transparently if a host is suddenly unreachable. Replication is trickier, since:

on krb1001, the replicate-krb-database timer periodically dumps the content of the Kerberos database to file and use the kprop (please do not confuse it with kpropd) tool to send the new database snapshot to all the replica hosts.
on krb2001, one of the replicas, the kpropd listens on port 754 for updates coming from krb1001 and apply them to the local copy of the database if needed.

What happens is krb2001 goes down? Not much, the clients will not notice it, but hiera will need to be changed to avoid replication to fail (namely removing krb2001 from the list of kdc hosts).

What happens if krb1001 goes down? This use case is trickier and needs to be reviewed:

The hiera configuration needs to be changed to remove krb1001 from the available kdc hosts, and set krb2001 as the active kadmind master.
In this way, all the configuration available on krb1001 to manage credentials and rsync then to puppetserver1001 will be added to krb2001.
The hiera configuration related to the list of kdc servers needs to be updated, since the clients will try to contact the kdc hosts in the order that the config specify. In this case, changing a password on krb2001 (new master) having krb1001 listed as first kdc server will cause failures in logging for the related principal until the new credentials are propagated to krb1001 via replicate-krb-database.

Example: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/542112/

The above step is tricky since we have to remind that every kerberos host has its local database with credentials. The master node uses kprop to replicate credentials, so a client can hit any kdc daemon and obtain a consistent authentication. But if the master node changes, and credentials are modified on the new host, they need to be propagated back when the old master node is ready to get back its leadership status (say after maintenance). This should be handled transparently by puppet when changing the hiera configuration, but some care is needed by the operator to triple check that everything happens correctly.

Manage principals and keytabs

In the Kerberos world, every user is called 'principal' and it may be related to a real human user or a service. In the former case, the user simply kinit, input a password and receive a TGT (that will be used later on to get service access etc..). In the latter, there is a convenient way to store the password to allow a transparent service bootstrap without any manual password input by the operator, namely the keytab.

Check if a user has a principal

Log on the current kadmin primary (see Hiera kerberos_kadmin_server_primary ) and execute: sudo manage_principals.py get $username-that-you-want-to-check

Create a principal for a real user

In this case, the workflow is to generate the principal with a temporary password, and then allow the user to change it as soon as possible. This is currently possible logging on the current kadmin primary (see Hiera kerberos_kadmin_server_primary or just log in in one of the krb nodes, there is a bit motd that explains what servers should not be used) and using the command manage_principals.py like the following:

sudo manage_principals.py get elukey
sudo manage_principals.py create elukey --email_address=ltoscano@wikimedia.org

The second use case is very interesting, since it does the following:

A principal is created via kadmin.local.
A temporary random password is set, together with its expiry in seconds.
An email is sent to the email_address provided, with instructions about how to change the password. It is sufficient to log in on a node that is a kerberos client, run kinit and follow the instructions.

When you add a kerberos principal please remember to add krb: present to the user's entry in Puppet's admin/data.yaml.

Create a keytab for a service

In order to properly authenticate a service, we need to:

create the service principal (example: hdfs/analytics1028.eqiad.wmnet@WIKIMEDIA)
create the keytab for the principal (that eventually will be stored on the the host to be used by a daemon/service).

There is a script on the krb nodes that can help with this: generate_keytabs.py. It takes as input a file listing, for every row, what to create (principals, keytabs, etc..) and it stores the output (e.g. keytab files) to a known location by default under /srv/kerberos/keytabs.

Here's an example config which creates a test-user principal and keytab for the server sretest1001.eqiad.wmnet:

sretest1001.eqiad.wmnet,create_princ,test-user
sretest1001.eqiad.wmnet,create_keytab,test-user

You need to pass the realm which is WIKIMEDIA in our setup:

sudo generate_keytabs.py --realm WIKIMEDIA list.txt

The keytabs are deployed in puppet via the secret module (see profile::kerberos::keytabs) so they need to be copied to puppetserver1001 in some way. There is a handy rsync server on the krb hosts that offers a module to copy srv only from puppetserver1001, using user+password (see /srv/kerberos/rsync_secrets_file on the krb hosts). For example, you can do on puppetserver1001:

mkdir /home/elukey/keytabs
rsync -r kerb@krb1001.eqiad.wmnet::srv-keytabs ./keytabs
sudo chown -R gitpuppet:gitpuppet ./keytabs
sudo cp -a ./keytabs/* /srv/git/private/modules/secret/secrets/kerberos/keytabs/
rm -rf /home/elukey/keytabs

Another option is of course to rsync directly to the right location in the private repo and chown -R gitpuppet:gitpuppet.

Finally, you need to commit the created keytabs/principals to the private repo. Make sure to also include dummy keytabs to the labs.private repo, since that's what is used by the Puppet compiler.

Create a custom principal and keytab entry

Sometimes it is necessary to create principals and corresponding keytab entries that do not match the format or structure dictated by manage_principals.py and generate_keytabs.py. An example of this is when creating a new principal and keytab to match a DNS alias for a service, rather than a hostname. We can use kadmin.local to do this:

Creating a principal

Use the addprinc command of kadmin.local.

root@krb1001:/home/btullis# kadmin.local addprinc -randkey presto/analytics-test-presto.eqiad.wmnet@WIKIMEDIA

Creating a keytab entry for a given principal in a specific keytab file

Use the ktadd command of kadmin.local. Ensure that the -norandkey parameter is added and specify the keytab file with -k.

root@krb1001:/home/btullis# kadmin.local ktadd -norandkey -k /srv/kerberos/keytabs/an-test-coord1001.eqiad.wmnet/presto/presto.keytab presto/analytics-test-presto.eqiad.wmnet@WIKIMEDIA

Reset a kerberos user password

If a user needs their kerberos password reset, use the following command:

razzi@krb1001:~$ sudo manage_principals.py reset-password razzi --email_address=rabuissa@wikimedia.org

Delete a principal for a real user

This is very simple, here an example for user elukey@WIKIMEDIA:

elukey@krb1001:~$ sudo manage_principals.py delete elukey@WIKIMEDIA

Delete Kerberos principals and keytabs when a host is decommissioned

When decommissioning a host, some Kerberos credentials for various daemons/services may need to be dropped. This guide assumes that the host has already been removed from any production service etc.., so it is not actively using any of the following credentials. If you are not sure, please do not proceed and check puppet first.

As starter, it is good to verify what service principals have been created for the host:

elukey@krb1001:~$ sudo manage_principals.py list *analytics1042*
HTTP/analytics1042.eqiad.wmnet@WIKIMEDIA
hdfs/analytics1042.eqiad.wmnet@WIKIMEDIA
yarn/analytics1042.eqiad.wmnet@WIKIMEDIA

Drop all the principals via the same script, like:

sudo manage_principals.py delete HTTP/analytics1042.eqiad.wmnet@WIKIMEDIA
...
...

In this case we have three service principals, so we have probably created keytabs for them too. Let's check:

root@krb1001:/srv/kerberos/keytabs# find analytics1042*
analytics1042.eqiad.wmnet
analytics1042.eqiad.wmnet/hadoop
analytics1042.eqiad.wmnet/hadoop/HTTP.keytab
analytics1042.eqiad.wmnet/hadoop/yarn.keytab
analytics1042.eqiad.wmnet/hadoop/hdfs.keytab

The directory on the kerberos masters is only a backup, so it is safe to just manually rm the host directory. Then check on puppetserver1001:

elukey@puppetserver1001:/srv/git/private$ find -name *analytics1042*
./modules/secret/secrets/kerberos/keytabs/analytics1042.eqiad.wmnet

Drop the directory and commit, and you are done!

List of principals and their keytabs for Hadoop

The following configuration files are the ones used for the Hadoop test cluster. They are meant to be used with the generate_keytabs.py script.

Hadoop Master node

analytics1028.eqiad.wmnet,create_princ,HTTP
analytics1028.eqiad.wmnet,create_princ,hdfs
analytics1028.eqiad.wmnet,create_princ,yarn
analytics1028.eqiad.wmnet,create_princ,mapred

analytics1028.eqiad.wmnet,create_keytab,HTTP
analytics1028.eqiad.wmnet,create_keytab,hdfs
analytics1028.eqiad.wmnet,create_keytab,yarn
analytics1028.eqiad.wmnet,create_keytab,mapred

Hadoop Master (standby) node

analytics1029.eqiad.wmnet,create_princ,HTTP
analytics1029.eqiad.wmnet,create_princ,hdfs
analytics1029.eqiad.wmnet,create_princ,yarn
analytics1029.eqiad.wmnet,create_princ,mapred

analytics1029.eqiad.wmnet,create_keytab,HTTP
analytics1029.eqiad.wmnet,create_keytab,hdfs
analytics1029.eqiad.wmnet,create_keytab,yarn
analytics1029.eqiad.wmnet,create_keytab,mapred

Hadoop worker node

analytics1031.eqiad.wmnet,create_princ,HTTP
analytics1031.eqiad.wmnet,create_princ,hdfs
analytics1031.eqiad.wmnet,create_princ,yarn

analytics1031.eqiad.wmnet,create_keytab,yarn
analytics1031.eqiad.wmnet,create_keytab,hdfs
analytics1031.eqiad.wmnet,create_keytab,HTTP

Hadoop coordinator (please note the special need for Oozie to have a single keytab for two principals)

analytics1030.eqiad.wmnet,create_princ,HTTP
analytics1030.eqiad.wmnet,create_princ,hdfs
analytics1030.eqiad.wmnet,create_princ,yarn
analytics1030.eqiad.wmnet,create_princ,hive
analytics1030.eqiad.wmnet,create_princ,oozie
analytics1030.eqiad.wmnet,create_princ,HTTP-oozie
analytics1030.eqiad.wmnet,create_princ,analytics

analytics1030.eqiad.wmnet,create_keytab,HTTP
analytics1030.eqiad.wmnet,create_keytab,hdfs
analytics1030.eqiad.wmnet,create_keytab,yarn
analytics1030.eqiad.wmnet,create_keytab,hive
analytics1030.eqiad.wmnet,create_keytab,oozie
analytics1030.eqiad.wmnet,create_keytab,HTTP-oozie
analytics1030.eqiad.wmnet,merge_keytab,HTTP-oozie,oozie
analytics1030.eqiad.wmnet,create_keytab,analytics

Druid worker (only kerberos credentials to access HDFS, not to authenticate Druid's clients)

analytics1041.eqiad.wmnet,create_princ,druid

analytics1041.eqiad.wmnet,create_keytab,druid

Hadoop UIs

analytics1039.eqiad.wmnet,create_princ,HTTP
analytics1039.eqiad.wmnet,create_princ,hue

analytics1039.eqiad.wmnet,create_keytab,HTTP
analytics1039.eqiad.wmnet,create_keytab,hue

References

↑ https://en.wikipedia.org/wiki/Kerberos_(protocol)

[1] ttps://en.wikipedia.org/wiki/Kerberos_(protocol)

[1]