Monitoring/Disk space

From Wikitech

You are probably here because Icinga mentioned this link during a DISK space alert.

This page is meant to have separate sections for different types of servers or roles which link to runbooks explaining what to do if a certain server runs out of disk.

This is done so that we can link to a single page from the universal check in the puppet base module and still have different instructions for different types of servers. It avoids having to do this distinction in puppet code.

Please expand this page with new sections by server role.

Short term fix

If the server have a small disk (eg. old VM) running the following commands might free up a few G by removing cache or un-needed packages (or old kernels) and temporarily fixing the issue.

sudo apt-get autoclean

sudo apt-get autoremove

Prometheus

For servers that host Prometheus.

Cloud NFS

Extending an LVM volume (e.g. backup hosts)

Some hosts (e.g. backup hosts) do not fully provision their entire disk array at once/by default, in case it can be used for a different purpose/separate filesystem later in its life cycle (e.g. a separate Bacula storage device). To allow for future expansion, LVM is used as a device mapper framework.

Thanks to LVM and the capabilities of modern filesystem formats, it is possible to extend a partition's size in a hot way.

To do so:

  • Extend the underlying physical space or make sure there was unallocated space available. For example, if a new physical disk has been installed, you can add it to a physical volume's pool, or you could increase the disk quota of a virtual machine. The exact method will depend on how this is done, and you must be sure it is doable. For example, while many PCI-interface block storage devices are supposed to be hot-plugged, it may not be supported by the motherboard vendor or be practical.
    • Example, for the case when a new disk/block device "sdz" is made available:
# cfdisk /dev/sdz <-- create a new LVM partition (8m), which will likely be called sdz1, this will destroy its data!!!!
# pvcreate /dev/sdz1
# vgextend hwraid /dev/sdz1
  • No matter what it is done on Step 1, you should now have available disk in the volume group. To confirm:
# vgs
  VG            #PV #LV #SN Attr   VSize   VFree  
  backup2003-vg   1   1   0 wz--n- 399.50g  79.90g
  hwraid          1   1   0 wz--n- 160.09t 110.09t  <--- there is 110 TB free on this volume group
  • Assign the desired extra space to the partition (logical volume)
# lvs
  LV      VG            Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  data    backup2003-vg -wi-ao---- 319.60g                                                    
  backups hwraid        -wi-ao----  50.00t

# lvextend -L+50T /dev/mapper/hwraid-backups
  Size of logical volume hwraid/backups changed from 50.00 TiB (13107200 extents) to 100.00 TiB (26214400 extents).
  Logical volume hwraid/backups successfully resized
  • Now resize the partition to take up 100% of the partition size. This can be done fully online and with the partition mounted and in use (e.g. extending the mysql data dir while mysql is running will create 0 issues), as long as the filesystem is xfs or ext4 (other filesystems should be checked on an individual basis). However, for a slow (spinning disks) and/or busy device, it could take some minutes to apply (~1.5TB/minute).
    • For an xfs filesystem:
# xfs_growfs /srv/backups
meta-data=/dev/mapper/tank-data  isize=512    agcount=4, agsize=241068800 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0, rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=964275200, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=470837, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
    • For an ext4 filesystem:
# resize2fs /dev/mapper/hwraid-backups
resize2fs 1.44.5 (15-Dec-2018)
Filesystem at /dev/mapper/hwraid-backups is mounted on /srv/bacula; on-line resizing required
old_desc_blocks = 6400, new_desc_blocks = 12800
The filesystem on /dev/mapper/hwraid-backups is now 26843545600 (4k) blocks long.
  • The process is now done, unless you got errors in some of the previous steps:
# lvs
  LV      VG            Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  data    backup2003-vg -wi-ao---- 319.60g                                                    
  backups hwraid        -wi-ao---- 100.00t

# df -h
...
/dev/mapper/hwraid-backups       100T   47T   54T  47% /srv/bacula
...

Both the Debian Administrator's Handbook and the Red Hat Storage Administrator's Guide are good references for further details.