Tendril is a tool for analytics and performance tuning of the MariaDB servers, developed by Springle (with features added in an ad-hoc fashion as required) and used by the DBA team. For security reasons it is only accessible to people with NDA.
Prior to October 2015, the Ishmael tool provided a similar service.
- Front end (vanilla PHP) runs on dbmonitor1001,2001.wikimedia.org; scripts in /srv/tendril/
- Backend database presently on db1115.eqiad.wmnet (former DB was db1011.eqiad.wmnet) using MariaDB event scheduler and FEDERATEDX engine.
- Uses a pull method for monitoring with one watchdog database client connection to each host.
- Entirely and intentionally separate from ganglia and mediawiki-config to allow sanity checks both ways.
- Uses Labs LDAP for authentication (restricted to people with NDA).
It houses a lot of functionality, but here are a few highlighted features:
There are many different report views, such as:
- Slow queries. E.g. Slow queries from MediaWiki requests, slow analytics queries (from user 'research')
- An ad-hoc cluster-wide INFORMATION SCHEMA view (mostly standard schema plus server_id field).
- Cluster-wide query process list close to real time (10s intervals).
Based on actual reported configuration (no dependence on coredb or wmf-config/db-eqiad.php)
Information about each database host (e.g. IP, RAM, uptime) and interactive graphs and charts indicating their recent query activity and various InnoDB statistics.
Also linked to from most other reports.
Tendril, or more accurately, its host and mariadb database server instance (as of Oct 2019, db1115 is the active host for it), provides the following services, documented here so that on maintenance, we know they will be affected:
- Tendril database monitoring, as documented on this page and serving queries from url https://tendril.wikimedia.org
- Dbtree, public version of tendril tree view at https://dbtree.wikimedia.org
- Both dbtree and tendril are served by dbmonitor frontend, and use the tendril schema
- Zarcillo database inventory, used manually by DBAs (dashboard url to be developed, some cli tools on cumin hosts)
- Database backups metadata, started on cumin + dbprov hosts, and using zarcillo and alerting on backup status on icinga (db1115)
- Systemd unit on 2+2 Grafana servers on eqiad and codfw, automatically adding production databases to monitoring from zarcillo (alerting on icinga)
- These last 3 use the zarcillo schema