Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Server admin log: Prod
Admin log: RelEng
Incident status
Deployments
SRE Team Help
Cloud VPS & Toolforge
Cloud VPS portal
Toolforge portal
Request VPS project
Admin log: Cloud VPS
Search
Search
English
Appearance
Donate
Log in
Personal tools
Donate
Log in
User
:
CDanis (WMF)/Use more heatmaps
User page
Discussion
English
Read
View source
View history
Tools
Tools
move to sidebar
hide
Actions
Read
View source
View history
General
What links here
Related changes
User contributions
Logs
View user groups
Special pages
Permanent link
Page information
Get shortened URL
Download QR code
Print/export
Create a book
Download as PDF
Printable version
Appearance
move to sidebar
hide
From Wikitech
<
User:CDanis (WMF)
An illustration of a real incident on Wikimedia's API appserver cluster: load increased, and a small group of (older) servers saturated their CPUs under load, leading to greatly increased tail latency for users. Asking our load balancer to send fewer queries to the older servers restored user happiness.
Plots from Wikimedia's appserver pool showing a low average CPU utilization across nodes, but that's not the whole story: the heatmap shows there are actually two distinct groups of nodes, one with low utilization and one with medium/high utilization (and a couple debug servers with near-0 utilization)