Portal:Toolforge/Admin/Archive

From Wikitech

This is a documentation for tool labs admins

Lot of stuff is missing, so please add all documentation of how things are set up here

Tools

Creation of new tool

Users create tools themselve, just make sure that toolwatcher is running

Removal of tool

Login to tools-login, and execute

sudo su
cd /home/petrb/bin
./rmtool "<name of tool>"

Follow all instructions / eventually respond to questions, this is interactive script, don't run it in nohup

Disabling tools running on -login

There should be no bots or such running directly on -login, these should run on a grid. If you see anyone running a bot on -login do following:

If they run them in cron

Comment out the jobs and leave a message with explanation why you did it.

If they run them in a screen

Kill them and execute: /usr/local/sbin/warn-screen <list of pts>

Configuration of instance

Memory

Every instance has overcommit disabled, that is done using /etc/sysctl.d/60-vm.overcommit_memory.conf - this file is inside of init.pp so that every instance get it

NFS

Every instance need to use nfs by default, this is done by putting it to proper class and then enforcing puppet and rebooting. By default they use gluster,

New instance cookbook

Make sure this new instance doesn't provide service which needs own security group - if you fail to add one in time you will have to delete it

Exec nodes

  • Make sure that exec nodes have their own external IP so that identd works and bots can connect to IRC inter alia

Mail

The /var/mail is a symlink to /data/project/.system/mail on all servers, that makes the mail boxens shared on whole tools project

There is tools-mail which only Coren knows, what is for

Toolwatcher (tools-login)

There is a daemon on tools-login called toolwatcher. It creates folders for new tools, creates local databases and updates the webservers' configuration. If you reboot tools-login or if it's dead, you need to (re-)start it:

sudo su
service toolwatcher start

Access to instances

There is a puppet variable restricted_to. It is set to local-admin on all machines which access should be restricted to.

Howto

List all jobs

$ qstat -u '*'

See how much memory is being used

$ qstat -F h_vmem

See information about jobs that finished

$ ssh tools-master
$ qacct --help

local-admin

There is admin service group called local-admin, it has own documentation page at http://tools.wmflabs.org/admin/ here is a copy of it, for case that webservers are offline:

History of tools

Every hour, the list of tools (service groups) is dumped and committed to the repository at ~tools.admin/var/lib/git/servicegroups by the script ~tools.admin/bin/toolhistory (backup) that runs as the continuous job toolhistory.