Jump to content

Performance

From Wikitech
(Redirected from Performance/Runbooks)

Wikimedia Performance is an initiative to provide tools to analyze and improve the site performance of Wikipedia. We do this through a combination of live instrumentation, continuous monitoring, and debug profiling. Created in 2014 and led by the Wikimedia Performance Team from 2014–2023, it is now a collaboration between MediaWiki Engineering, SRE Observability, and QTE.

Guides

The "performance practices" guides help set direction, and can be used to guide new developments, or to periodically identify areas of improvement in current code by looking for differences.

The "measure" guides help assess the actual performance of existing code, and to help iterate on proposed changes before, during, and after their development.

Tools

Recommended tools to analyze or improve performance, aimed at MediaWiki developers and SRE.

Frontend synthetic testing

  • Page drilldown (Grafana): Detailed performance measures from synthetic testing of page views and user journeys (split by URL, connection type, browser, and more). The same dashboard and feature-set covers Synthetic testing both from desktop browsers and our Mobile Device Lab.
  • Fresnel CI: Easy access to frontend performance stats during code review (a selection of synthetic and real-user metrics).

Frontend development

See MediaWiki Engineering#Frontend development.

Frontend real-user traffic

  • Navigation Timing (Grafana): Page load time and other real-user metrics from MediaWiki page views, collected via Navigation Timing and Paint Timing APIs (split by platform, country, and browser).
  • Real user monitoring (Grafana/Prometheus): User Experience metrics collected via Navigation Timing and Paint Timing APIs and stored in Prometheus.
  • responseStart by CDN host (Grafana): Roundtrip latency from browsers, split by CDN host. Allows for natural experimentation and regression detection around Wikimedia CDN, e.g. Linux kernel changes, and upgrades or configuration changes to Varnish, HAProxy or ATS.
  • CruX report (Grafana): Independent copy of Google's periodically published Chrome UX Report and the Core Web Vitals as measured from eligible Chrome-with-Google-account users.
  • CPU benchmark (Grafana): Collected as part of our Navigation Timing beacon to help asses baseline performance. Also powers the AS Report.
  • Google Web Vitals (Grafana): Google Web Vitals metrics important for a good user experience, collected by us.
  • AS Report (performance.wikimedia.org): Periodic comparison of backbone connectivity from different Internet service providers, based on anonymised Navigation Timing and CPU benchmark datasets.

Analyze backend traffic

  • MediaWiki Entrypoint Profiling (Grafana): Breakdown backend latency of any request route, by attributing time spent to high-level components in core and extensions.
  • PHP Flame Graphs (performance.wikimedia.org): Breakdown MediaWiki backend latency to individual function calls from sampled stacktraces (reported hourly and daily, split by service entry point).
  • WikimediaDebug: Capture per-request data when debugging in production, e.g. ad-hoc, when staging a deployment.
    • Excimer UI: Flame graphs to visualise time spent through detailed stack traces from function calls.
    • Verbose logs: Capture verbose debug messages in Logstash.
    • XHGui: Analyze server performance through memory use and function call count.

Backend development

See MediaWiki Engineering#Backend development.

Public data and software

Infrastructure diagram.

Public datasets and software relating to performance that may be of interest:

  • AS Report - periodic reports on effective performance of Internet providers around the world, based on Navigation Timing and CPU benchmark datasets.
  • NavigationTiming extension - JavaScript client to collect Navigation Timing and Paint Timing API metrics send performance beacons.
  • Excimer - Low-overhead sampling profiler and interrupt timer for PHP.
  • Arc Lamp - Excimer client to collect profile samples from a running PHP production application, and aggregate these into flame graphs.
  • ResourceLoader - MediaWiki's delivery system in PHP for JavaScript, CSS, interface icons, and localisation text.

See also:

Essays