Monitoring/Latency

From Wikitech

Icinga has alerts for increased latency at the Mediawiki appserver level.

In June 2021 we had a case where several appservers entered a weird state that caused latency alerts. The solution was to restart PHP on the affected appservers. The following Prometheus query was used to find the appservers that were in the weird state:

(phpfpm_statustext_processes{cluster="appserver",state="idle"}) < 10

Then a

sudo restart-php7.2-fpm

This reoccurred 3 times in 2 days, we ended up restarting all appservers. Phabricator task: https://phabricator.wikimedia.org/T285634