Portal:Toolforge/Admin/Runbooks/ToolsDBAlmostFull
This happens when the free disk space in a ToolsDB host is getting close to zero. The alert will initially be at level "warning", and escalate to "page" if the free space goes below 5%.
Error / Incident
This usually comes in the form of an alert in alertmanager.
Debugging
Finding what is taking up space
fnegri@tools-db-6:~$ sudo du -hs /srv/labsdb/* |sort -hr |head -10
2.1T /srv/labsdb/data
905G /srv/labsdb/binlogs
20K /srv/labsdb/tmp
16K /srv/labsdb/lost+found
fnegri@tools-db-6:~$ sudo du -hs /srv/labsdb/data/* |sort -hr |head -10
281G /srv/labsdb/data/s53220__quickstatements_p
250G /srv/labsdb/data/s51434__mixnmatch_p
183G /srv/labsdb/data/s53685__editgroups
142G /srv/labsdb/data/ibdata1
95G /srv/labsdb/data/s51698__yetkin
83G /srv/labsdb/data/s51412__data
70G /srv/labsdb/data/s51114__enwp10
58G /srv/labsdb/data/s53952__freebase_p
57G /srv/labsdb/data/s51499__wikiminiatlas
56G /srv/labsdb/data/s51156__petscan
Common issues
ibdata1 file growing
Long uncommitted transactions can cause the file /srv/labsdb/data/ibdata1 to grow very quickly. You can check for active transactions with SHOW ENGINE INNODB STATUS\G from a MariaDB console (sudo mariadb in the tools-db host).
Look out for something like ---TRANSACTION (0x7f739e977e80), ACTIVE 455393 sec, with a big number of active seconds.
See phab:T409716 for more details.
data growth of one of the user databases
If disk space is low because the user data is growing, we can increase the disk size. The data volume is a Cinder volume that can be easily resized, see Extending a volume.
Related information
Support contacts
The main discussion channel for this alert is the #wikimedia-cloud-admin in IRC.
If the situation is not clear or you need additional help, you can also contact the Data Persistence team (#wikimedia-data-persistence on IRC).
Old incidents
Add any incident tasks here!
