Monitoring/Disk space
You are probably here because Icinga mentioned this link during a DISK space alert.
This page is meant to have separate sections for different types of servers or roles which link to runbooks explaining what to do if a certain server runs out of disk.
This is done so that we can link to a single page from the universal check in the puppet base module and still have different instructions for different types of servers. It avoids having to do this distinction in puppet code.
Please expand this page with new sections by server role.
Short term fix
If the server have a small disk (eg. old VM) running the following commands might free up a few G by removing cache or un-needed packages (or old kernels) and temporarily fixing the issue.
sudo apt-get autoclean
sudo apt-get autoremove
Prometheus
For servers that host Prometheus.
Cloud NFS
Extending an LVM volume (e.g. backup hosts)
Some hosts (e.g. backup hosts) do not fully provision their entire disk array at once/by default, in case it can be used for a different purpose/separate filesystem later in its life cycle (e.g. a separate Bacula storage device). To allow for future expansion, LVM is used as a device mapper framework.
Thanks to LVM and the capabilities of modern filesystem formats, it is possible to extend a partition's size in a hot way.
To do so:
- Extend the underlying physical space or make sure there was unallocated space available. For example, if a new physical disk has been installed, you can add it to a physical volume's pool, or you could increase the disk quota of a virtual machine. The exact method will depend on how this is done, and you must be sure it is doable. For example, while many PCI-interface block storage devices are supposed to be hot-plugged, it may not be supported by the motherboard vendor or be practical.
- Example, for the case when a new disk/block device "sdz" is made available:
# cfdisk /dev/sdz <-- create a new LVM partition (8m), which will likely be called sdz1, this will destroy its data!!!! # pvcreate /dev/sdz1 # vgextend hwraid /dev/sdz1
- No matter what it is done on Step 1, you should now have available disk in the volume group. To confirm:
# vgs VG #PV #LV #SN Attr VSize VFree backup2003-vg 1 1 0 wz--n- 399.50g 79.90g hwraid 1 1 0 wz--n- 160.09t 110.09t <--- there is 110 TB free on this volume group
- Assign the desired extra space to the partition (logical volume)
# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert data backup2003-vg -wi-ao---- 319.60g backups hwraid -wi-ao---- 50.00t # lvextend -L+50T /dev/mapper/hwraid-backups Size of logical volume hwraid/backups changed from 50.00 TiB (13107200 extents) to 100.00 TiB (26214400 extents). Logical volume hwraid/backups successfully resized
- Now resize the partition to take up 100% of the partition size. This can be done fully online and with the partition mounted and in use (e.g. extending the mysql data dir while mysql is running will create 0 issues), as long as the filesystem is xfs or ext4 (other filesystems should be checked on an individual basis). However, for a slow (spinning disks) and/or busy device, it could take some minutes to apply (~1.5TB/minute).
- For an xfs filesystem:
# xfs_growfs /srv/backups meta-data=/dev/mapper/tank-data isize=512 agcount=4, agsize=241068800 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=0, rmapbt=0 = reflink=0 data = bsize=4096 blocks=964275200, imaxpct=5 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=470837, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0
- For an ext4 filesystem:
# resize2fs /dev/mapper/hwraid-backups resize2fs 1.44.5 (15-Dec-2018) Filesystem at /dev/mapper/hwraid-backups is mounted on /srv/bacula; on-line resizing required old_desc_blocks = 6400, new_desc_blocks = 12800 The filesystem on /dev/mapper/hwraid-backups is now 26843545600 (4k) blocks long.
- The process is now done, unless you got errors in some of the previous steps:
# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert data backup2003-vg -wi-ao---- 319.60g backups hwraid -wi-ao---- 100.00t # df -h ... /dev/mapper/hwraid-backups 100T 47T 54T 47% /srv/bacula ...
Both the Debian Administrator's Handbook and the Red Hat Storage Administrator's Guide are good references for further details.