Jump to content

Arc Lamp

From Wikitech

This is the internal runbook for deploying and monitoring Arc Lamp. Arc Lamp collects stack traces from all MediaWiki production traffic, using php-excimer, and publishes daily aggregated flame graphs and trace logs to https://performance.wikimedia.org/php-profiling/.

Service

Arc Lamp data originates from the Arc Lamp client (configured in operations/mediawiki-config.git:/Profiler.php) that runs alongside MediaWiki on the application servers. The samples are collected with php-excimer, and sent to a Redis instance on the arclamp host.

The Arc Lamp service then reads the stream of samples from Redis, and produces trace logs on disk. The flame graphs are generated with brendangregg/FlameGraph.

Stats from the Arc Lamp client are sent to Graphite. Stats from the Arc Lamp service go to Prometheus.

Architecture of Arc Lamp (as of December 2019).

Provision

We currently run one arclamp#### host in each core data center (Eqiad and Codfw).

There's also an equivalent deployment-webperf host in the Beta Cluster.

The service runs as multiple independent processes, including:

  • excimer*-log: Continuous service, instances of arclamp-log.py (one per stream).
  • arclamp_generate_svgs: Systemd timer (every 15min)
  • arclamp_compress_logs: Systemd timer (every 1h).

Deployment

Review and start polling the logs locally on the primary arclamp server.

krinkle@arclamp1001$ sudo journalctl -n2000 -g 'arclamp[_-]|excimer-' -f

Then, after landing the change in Gerrit, use a deployment server to prepare /srv/deployment/performance/arc-lamp and deploy with Scap.

Monitoring

Host migration

For how to migrate to a new host, refer to past arclamp1001 upgrade (T319434).

History

  • In 2014, we switched MediaWiki from PHP5 to HHVM which featured the Xenon sampling profiler. A Redis instance was added to mwlog to collect stack trace samples from running MediaWiki requests, to then be received by arclamp-log.py on the same mwlog host, which aggregates them into periodic logs and flame graphs hosted locally. These were then served over an internal Apache server, exposed via a proxy under performance.wikimedia.org/xenon
  • In 2017, we transitioned from HHVM to PHP7, and developed php-excimer as replacement for HHVM's Xenon (T205059).
  • From 2017 to 2019, a concerted effort consolidated various performance services (T158837). In 2018, we moved the arclamp-log process, flame graphs generation, and flame graph storage from mwlog hosts to a dedicated webperf-2 host (T195312).
  • In 2022, the webperf-2 role was renamed to arclamp (T319434).
  • In 2023, the Redis instance moved from mwlog to a decicated arclamp server (T327277).
  • In 2023, storage was moved to Swift and retention increased from 3 months to 2+ years (T200108).

See also