Analytics/Cluster/System Users

From Wikitech

The Analytics Cluster is more multi tenant than any other system in WMF production. It is often used by individual users to do analysis and run jobs, but there is often a need for productionized jobs to run as a user that is not tied to a real person's user account. To accomplish this, we create posix system users and groups, and then allow real users in a certain group to sudo as that system user.

Example

The Search team wants to productionize jobs to run in Hadoop. Members of the search team need to be able to schedule and maintain these jobs as a posix user that is not a real human user account. They have:

  • System user analytics-search: will run jobs an own files. This user's main group is also called analytics-search.
  • Group analytics-search-users: Real user accounts for users on the Search team are members of this group. Members of this group are allowed to sudo as the analytics-search user.

The analytics-search system user and group, along with the real user memberships in the analytics-search-users group must be declared on all analytics cluster nodes.


WIP Instructions for creating a new analytics system user and groups in Puppet

Let's walkthrough creating a system user and associated groups for a hypothetical 'sandwich engineering team' :)

Edit modules/admin/data/data.yaml to do the following:

  • Declare the analytics-sandwitch user and group
  • Declare the analytics-sandwich-users group and its members, including analytics-sandwich in system_members.
  • Add the analytics-sandwich user to analytics-privatedata-users system_members so it can access Hadoop.
groups:
  analytics-privatedata-users:
    # ...
    system_members: [..., analytics-sandwich]
  # ...
  analytics-sandwich:
    gid: 920 # pick the next gid in the list
    system: true
    members: []
  analytics-sandwich-users:
    gid: 921 # next gid
    description: Group of users for managing sandwich engineering related analytics jobs
    members: [userA, userB, ...etc]
    privileges: ['ALL = (analytics-sandwich) NOPASSWD: ALL']
    system_members: [analytics-sandwich]
# ...
users:
  # ...
  analytics-sandwich:
    ensure: present
    system: true
    uid: 920 # pick the next uid in the list
    gid: 920 # pick the next gid in the list
    shell: '/bin/false'


More to do: Hiera, Kerberos keytabs, etc.