IDM/Runbook

From Wikitech
< IDM

Infrastructure

Endpoint Database LDAP DN Servers
Production https://idm.wikimedia.org mariadb://idm@m5-master.eqiad.wmnet:3306/idm cn=bitu,ou=profile,dc=wikimedia,dc=org idm1001/idm2001.wikimedia.org
Staging https://idm-test.wikimedia.org mariadb://idm-test@m5-master.eqiad.wmnet:3306/idm-test cn=bitu,ou=profile,dc=wikimedia,dc=org idm-test1001.wikimedia.org

Bitu is a Django application and runs under uwsgi (see /etc/uwsgi/apps-enabled/bitu.ini or Puppet) for configuration. In front of uwsgi is Apache2, for handling serving of static content and routing to the uwsgi workers. TLS termination is done using Envoy.

Failed services

The Bitu setup consists of two services and a number of systemd timers. The primary services are:

  • uwsgi-bitu - Main UWSGI service, responsible for running the webinterface.
  • rq-bitu - Background worker, pulls jobs from the Redis queues and handle longer running tasks and notifications.

Timers currently consist of:

  • sync_bitu_username_block - Pulls block list data from meta and wikitech, this data is used to check signups for undesired usernames.
  • expire_bitu_signups - If a signup has not been activated within a set time frame this job will delete the request, freeing up the username.

Any of the timers, and the rq-bitu service can safely be run/restarted at any time without noticeable user impact or data corruption. Restarting the uwsgi-bitu service will briefly interrupt self-service, but no invalidate user sessions (session information is database backed, and will survive a restart).

If any of the timers contentiously fail create a Phabricator task and tag it with "Bitu".

Failover

Bitu is currently installed on a set of Ganeti VMs, one of which is active at a time. It is possible to fail over to the inactive VM by changing the CNAME idm.wikimedia.org to point to passive host. This is done by modifying the DNS git repository.

Changing the CNAME is sufficient for changing the active webserver, the pair of VMs however also hosts a Redis installation, with one active node and one replica. If you need to reboot, reimage or do maintenance on the active node, the Redis nodes needs to be failed over as well. The Redis configuration is managed by Puppet. The initiate a switch-over update the following two variables in Puppet: profile::idm::redis_master and profile::idm::redis_replicas, see example here: https://gerrit.wikimedia.org/r/c/operations/puppet/+/930763

Debugging on the standby host

In some cases it might be beneficial to debug on the passive IDM host, e.g. when rolling out new version or preparing a switch over. The problem is that both Apache and Django will expect the host header to say idm.wikimedia.org. One solution is to install the ModHeader extension for Firefox and create a rule that adds "Host: idm.wikimedia.org" to your requests, to avoid breaking other sites, also add a filter e.g. "Tab domain filter: idm2001.wikimedia.org".

When authenticating with idp.wikimedia.org you will still be redirected to the current active server, simply change the URL to say idm2001.wikimedia.org (if that's the server you're debugging), the URL will contain the required information for Bitu to authenticate you with the IDP afterwards.

Debug logging

By default we only run with limited logging enabled, but for debugging it can be necessary to increase the logging level. The development configuration shipped with Bitu already enables debug logging to the console. To enable debug logging on staging or production hosts configure the logger as in the example below:

LOGGING = {
    "version": 1,
    "disable_existing_loggers": False,
    "handlers": {
        "file": {
            "level": "DEBUG",
            "class": "logging.FileHandler",
            "filename": "/tmp/bitu_debug.log",
        },
    },
    "loggers": {
        "bitu": {
            "handlers": ["file"],
            "level": "DEBUG",
            "propagate": True,
        },
    },
}

Loggers are configured in /etc/bitu/settings.py.