Portal:Toolforge/Admin/Runbooks/ToolsNfsAlmostFull

From Wikitech

The ToolsNfsAlmostFull alert fires when the Toolforge NFS server is almost out of disk space. This happens surprisingly often as the NFS share has no quotas.

The procedures in this runbook require admin permissions to complete.

Error / Incident

The Toolforge NFS server is almost out of disk space. This generally means that some space needs to be freed up.

Note that this alert comes in multiple severity levels, a warning alert means that there's much more space available than for a critical or a page alert.

As of 2024-01-03 the nfs server is tools-nfs-2.tools.eqiad1.wikimedia.cloud

Debugging

Try what was done last time

If the alert fires after a very short time (about a week or so) after the last time cleanup was done, it is usually caused by the same thing as the last time. Look at the task for that cleanup and look what was done there. Cleanup those again and nudge those maintainers.

Locate disk hogs

# ionice -c 3 nice -19 find /srv/tools -type f -size +100M -printf "%k KB %p\n" | sort -h > tools_large_files_$(date +%Y%m%d).txt

This will take a few hours to complete.

Common issues

Add here any new common issues you find.

Related information

Where?

Provide links to related tools, tickets of previous incidents, and documentation that may provide context or additional information, this might include links to the affected service main page (if any), diagrams, external links...

Support contacts

Who?

You may choose to list support contacts, individuals who have knowledge about this topc and who may be able to confirm the accuracy of the runbook or who may can help resolve issues related to the error or incident.

Note: Including the template {{:Help:Cloud Services communication}} may be sufficient.

Old incidents

Add here any new tasks for incidents you might encounter.