Portal:Toolforge/Admin/Services
This page contains information regarding the services that are provided internally to Toolforge nodes.
Deployment components and architecture
Information on how the setup is deployed, and the different components.
Servers
Usually a couple of VM instances:
They are setup following the 'cold-standby' approach. There is only one server active, while the other is just waiking to manually take over in case of disaster or maintenance.
Addressing, DNS and proxy
There is a proxy called deb-tools.wmflabs.org that should point to the active server. TODO: elaborate a bit more about the purpose of this proxy.
Other than that, servers don't have any special DNS or adressing. The don't have floating IPs.
Worth noting that these servers in the tools cloudvps project use to offer services for the toolsbeta project as well.
Puppet
The main role in use is role::wmcs::toolforge::services, which in turn uses several important puppet profiles related to the service role itself:
profile::toolforge::services::basic profile::toolforge::services::aptly
(some additional profiles are included in the role, but aren't specific for services nodes).
The active/standby assignations are controlled by 2 main hiera keys:
profile::toolforge::services::active_node: tools-sge-services-03.tools.eqiad1.wikimedia.cloud profile::toolforge::services::standby_node: tools-sge-services-04.tools.eqiad1.wikimedia.cloud
Also, clients of the aptly repo know about the active server by means of this hiera key:
role::aptly::client::servername: tools-sge-services-03.eqiad1.wikimedia.cloud
NOTE: in new toolforge-stretch, the puppet role is role::wmcs::toolforge::services and the relevant hiera key is profile::toolforge::services::active_node.
updatetools
![]() | This no longer runs in services nodes. This is now part of the 'admin' tool. See https://phabricator.wikimedia.org/T229261 |
updatetools is a Python script that updates tools and maintainers information to be used by tools.wmflabs.org (source code available at tool-admin-web).
It gets a list of tools (accounts starting with "tools."), reads their .description
and toolinfo.json
files and adds it to the tools
table in the toollabs_p
database. Maintainer information is retrieved by getting all users that belong to the tool's group and using getpwnam()
to retrieve user information, which then gets added to the users
table.
This script runs, as a service, from the active tools-services-*
server, and wakes up every 120 seconds to populate the tables with new data.
The database in use is tools.labsdb
which is tools.db.svc.eqiad1.wikimedia.cloud
.
apt repository
One of the main purposes of this service is to host Debian packages for other servers by means of aptly.
Repositories are declared in puppet, but packages should be added to the aptly repository by hand.
We usually have one repository per operating system and project, i.e:
- stretch-tools
- jessie-tools
- trusty-tools
- stretch-toolsbeta
Quick example of packages being stored here are:
- https://gerrit.wikimedia.org/r/#/admin/projects/labs/toollabs
- https://gerrit.wikimedia.org/r/admin/projects/operations/software/tools-webservice
- https://gerrit.wikimedia.org/r/admin/projects/operations/software/tools-manifest
(among others)
The repository data, located at /srv/packages is synced by means of rsync from the active node to the standby node every 10 minutes.
Admin operations
Information on maintenance and administration of this setup.
managing aptly repo
Is managed as a standard aptly repo.
health
Some interesting bits to check if you want to know the status/health of the server.
- aptly repos are present, and they contain packages, i.e:
sudo aptly repo list
andsudo aptly repo show --with-packages=true stretch-tools
- disk is not filled, i.e:
df -h /
- rsync is syncing repo data:
sudo systemctl status rsync.service
failover
We don't have a specific failover mechanism rather than updating hiera keys, pointing the DNS name to the standby server and running puppet everywhere.
Care should be taken to don't loss aptly repo data, since generating it from scratch can take some time. That's why there is a rsync job to sync data between them.
History
This was heavily remodeled when migrating the grid to SGE and to Stretch. Previous to the migration, the services nodes used to store Bigbrother (deprecated), and webservicemonitor (moved to cron servers).