Incident documentation/20151024-LabsNFS-Lag

From Wikitech
Jump to: navigation, search

Summary

NFS lagged for all labs instances when I (milimetric (talk)) executed a

tail -n +2

on a huge 35GB file in /data/project/milimetric/. This used up all the bandwidth.


Timeline

  • 2015-10-24 00:52:44UTC Started the operation
  • 2015-10-24 01:12:42UTC Coren alerted me on IRC
  • 2015-10-24 01:33:16UTC I killed the operation


Conclusions

  • I could have asked someone from ops to do the file operation locally on labstore1002
  • I could have rate limited with pv -L
  • I probably should have just fixed the source file on stat1002 (where it came from originally) and re-copied it to labs