Systemd resource control

From Wikitech
Jump to navigation Jump to search

This page shares a mechanism for doing server resource control for users using systemd.

The mechanism is based on systemd slices, which is in turn based on cgroups. Previous to systemd, one could use cgred, but that's considered obsolete in the systemd era.

How it works

systemd will put all user owned proccesses into a shared cgroup. The mechanism is very elegant and robuts.

But this doesn't allow specifying quotas or limits process or using other criteria, so this mechanism is not very granular.
On the other hand, in a systemd server, there is no way an user can workaround these limits (i.e, you failed with a regexp and then there are proccesses not covered by the limits).

systemd creates 2 basic slices by default:

  • user.slice (for user sessions and procs started by users)
  • system.slice (for system daemons and other units started by systemd itself)

Additionally, each logged user will be put in a sub-slice for itself:

  • user-NNNN.slice (NNNN = numeric user id)
  • user-YYYY.slice (YYYY = numeric used id)

Thus, there is a slice tree for resource control.

aborrero@puppetmaster1001:~$ systemctl status
● puppetmaster1001
    State: running
     Jobs: 0 queued
   Failed: 0 units
    Since: Tue 2018-08-28 19:38:50 UTC; 5 months 7 days ago
   CGroup: /
           ├─user.slice                                  <----------
           │ └─user-18194.slice                          <----------
           │   ├─session-694762.scope
           │   │ ├─ 4232 systemctl status
           │   │ ├─ 4233 pager
           │   │ ├─32447 sshd: aborrero [priv]
           │   │ ├─32471 sshd: aborrero@pts/0
           │   │ └─32481 -bash
           │   └─user@18194.service
           │     └─[...]
           └─system.slice                                <----------
             ├─lvm2-lvmetad.service
             │ └─449 /sbin/lvmetad -f
             ├─confd.service
             [...]

You can create generic or default slice config for all users, creating an unit named user-.slice. Limits placed here will be applied to all child slices.
If using this templating mechanism, beware that root will get the same resource constraint, specially if you just do sudo (since the sudo proc belongs to your personal user slice).
Things like puppet agent can consume a lot of resources. You can solve this by creating a root-specific slice configuration (user-0.slice) and probably leaving it unrestricted. And then loging directly as root (no user ssh and then sudo).


checking the configuration

A couple of handly commands to check slices are systemd-cgtop and systemd-cgls:

root@tools-sgebastion-06:~# systemd-cgtop
Control Group                                                                                        Tasks   %CPU   Memory  Input/s Output/s
/                                                                                                      206   10.0   754.3M        -        -
/user.slice                                                                                             22   10.3   177.2M        -        -
/user.slice/user-18194.slice                                                                             9    9.9   145.7M        -        -
/user.slice/user-0.slice                                                                                 7    0.4    19.3M        -        -
/system.slice                                                                                           79    0.0   105.5M        -    
[...]

root@tools-sgebastion-06:~# systemd-cgls
Control group /:
-.slice
├─user.slice
│ ├─user-0.slice
│ │ ├─session-2.scope
│ │ │ ├─ 727 sshd: root@pts/0
│ │ │ ├─ 759 -bash
│ │ │ ├─2720 systemd-cgls
│ │ │ └─2721 pager
│ │ ├─session-6.scope
│ │ │ └─init.scope
│ │ │   ├─1156 sshd: root@pts/1
│ │ │   └─1187 -bash
│ │ └─user@0.service
│ │   └─init.scope
│ │     ├─738 /lib/systemd/systemd --user
│ │     └─739 (sd-pam)
│ ├─user-18194.slice
│ │ ├─user@18194.service
│ │ │ └─init.scope
│ │ │   ├─1145 /lib/systemd/systemd --user
│ │ │   └─1146 (sd-pam)
[...]

In the slice config, if you explicitly activate accouting (see config section), you can check live numbers in the slice unit status:

root@tools-sgebastion-06:~# systemctl status user-18194.slice
● user-18194.slice
   Loaded: loaded
  Drop-In: /etc/systemd/system/user-.slice.d
           └─puppet-override.conf
   Active: active since Mon 2019-02-04 13:20:39 UTC; 34min ago
    Tasks: 9 (limit: 100)
   Memory: 145.7M (high: 100.0M max: 150.0M swap max: 0B)
      CPU: 3min 4.431s

Configuration

You can create configuration limits for each of these slices by means of these directives:

[Slice]
# each user can use max this % of one CPU
CPUQuota=10%
# each user can run max this number of tasks/threads
TasksMax=100
# slow down procs if they use more than this memory
MemoryHigh=100M
# if more than this memory is used, OOM killer will step in
MemoryMax=150M
# users can't use swap memory
MemorySwapMax=0
# do accounting but don't limit anything by now
IOAccounting=yes
IPAccounting=yes

You can create an override for all user slices creating an unit configuration for user-.slice, i.e, the wildcard slice for all users.


Puppet examples

By the time of this writting, Toolforge uses this mechanism in user-facing bastion servers.

This is present in the profile::toolforge::bastion::resourcecontrol class:

[..]
    # we need systemd >= 239 for resource control using the user-.slice trick
    # this version is provied in stretch-backports
    apt::pin { 'toolforge-bastion-systemd':
        package  => 'systemd udev',
        pin      => 'version 239*',
        priority => '1001',
    }

    $packages = [
        'systemd',
        'udev',
    ]

    package { $packages:
        ensure          => present,
        install_options => ['-t', 'stretch-backports'],
    }

    systemd::unit { 'user-.slice':
        ensure   => present,
        content  => file('profile/toolforge/bastion-user-resource-control.conf'),
        override => true,
    }

    systemd::unit { 'user-0.slice':
        ensure   => present,
        content  => file('profile/toolforge/bastion-root-resource-control.conf'),
        override => true,
    }
[..]