Monitoring/check dsh groups

From Wikitech
Jump to navigation Jump to search

This Icinga alert checks if a MediaWiki appserver is a member of the 'mediawiki-installation' DSH group.

Historically, before we had salt and now cumin we used DSH to run commands on multiple servers at once. Server groups were actual text files in the repository and "mediawiki-installation" was one of them. Taking a server out of the pool meant making an edit to this text file.

Nowadays things have changed and we use conftool/confctl to pool/depool app servers. Though scap still refers to this last remaining DSH group.

In the Puppet repo you can see the list of servers now comes from Hiera from ./hieradata/common/scap/dsh.yaml where a reference is made to conftool.

  mediawiki-installation:
    conftool:
      - {'cluster': 'appserver', 'service': 'apache2'}
      - {'cluster': 'api_appserver', 'service': 'apache2'}
      - {'cluster': 'jobrunner', 'service': 'apache2'}
      - {'cluster': 'testserver', 'service': 'apache2'}

The conftool data is in ./conftool-data/node in the puppet repo as well. Check if the affected host name shows up in there. If not, you can add it.

Make sure first there is no existing hardware issue with this server by searching Phabricator for its host name.

If it is in there but you still get the alert, first run scap pull to fetch the latest code and then pool to add it to the pool. The Icinga alert should recover a little while later.

Alternatively you can pool the server from a management host such as cumin1001 using conftool commands.