Incident documentation/20150625-LabsNFSOutage

From Wikitech
Jump to: navigation, search

Summary

There was a 12 minute Labs NFS outage (and subsequently Tools outage) caused by human error (reboot entered on the wrong terminal accidentally). The system came back up, the arrays were assembled and NFS was back up in 12 minutes.

Timeline

16:35: Yuvi accidentally types reboot into labstore1002 instead of a labs instance awaiting NFS removal.

16:38: Machine comes back up, Coren starts assembling arrays and bringing the NFS server back up

16:47: Instances all recovered from NFS stall, everything working ok


Actionables