Jump to content

Main menu

Main menu

Navigation

Main page
Recent changes
Server admin log: Prod
Admin log: RelEng
Incident status
Deployments
SRE Team Help

Cloud VPS & Toolforge

Cloud VPS portal
Toolforge portal
Request VPS project
Admin log: Cloud VPS

English

Log in

Personal tools

Log in

User:CDanis/Use more heatmaps

User page
Discussion

English

Read
View source
View history

Tools

Tools

Actions

Read
View source
View history

General

What links here
Related changes
User contributions
Logs
View user groups
Special pages
Permanent link
Page information

Print/export

Create a book
Download as PDF
Printable version

From Wikitech

An illustration of a real incident on Wikimedia's API appserver cluster: load increased, and a small group of (older) servers saturated their CPUs under load, leading to greatly increased tail latency for users. Asking our load balancer to send fewer queries to the older servers restored user happiness.

Plots from Wikimedia's appserver pool showing a low average CPU utilization across nodes, but that's not the whole story: the heatmap shows there are actually two distinct groups of nodes, one with low utilization and one with medium/high utilization (and a couple debug servers with near-0 utilization)

Retrieved from "https://wikitech.wikimedia.org/w/index.php?title=User:CDanis/Use_more_heatmaps&oldid=1859433"

This page was last edited on 6 March 2020, at 20:36.
Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. See Terms of Use for details.

Privacy policy
About Wikitech
Disclaimers
Code of Conduct
Developers
Statistics
Cookie statement
Mobile view