Monitoring/Long running screens

From Wikitech
Jump to navigation Jump to search

The Icinga check for "long running screen or tmux" processes has been added in https://phabricator.wikimedia.org/T165348.

reasoning was: "We should flag/alert long-running screen sessions, these are usually a sign of work which was forgotten or should rather be puppetised or launched by cron"


There are different options to solve an alert like this. Either determine it has been indeed been forgotten work and can be closed or that this is a kind of host where long running screens are expected and the host should be white-listed to exclude it from the monitoring check.


The user name and PID running the process in question is part of the script output. So if that matches their IRC nick they should already get highlighted.

What you can do:

  • ping the user and ask them if they still need the screen/tmux and ask them to close it

or

  • go to the host in question yourself and check what is in the screen
    • trick to get into the screen of another user: sudo -s; su $username; script /dev/null ; screen -x (why? [1])
    • if things look inactive / forgotten.. close the screen, otherwise go back to asking the user

or

  • white-list the host in question because long-running screens are expected
    • make a puppet change and set "monitor_screens: false" in Hiera for a role or host