SLO/Charts
Status: approved
Organizational
Service
- chart-renderer service, rendering chart JSON blobs to SVG + eCharts JSON blobs on demand
Extensions
- Extension:Chart in MediaWiki, providing the
{{#chart:}}parser function - Extension:JsonConfig's action API modules for fetching data from the central wiki
Teams
The responsible team for the feature is the Reader Growth team. You can talk to them on Slack in #talk-to-reader-growth or via an e-mail to reader-growth-team@wikimedia.org. The Reader Growth team is responsible for maintaining the Chart extension and the chart-renderer service, and for deploying changes to the service.
The Site Reliability Engineering team is responsible for maintaining the production infrastructure and instances that the chart-renderer service runs on.
Architectural
Environmental dependencies
- Extension:Chart runs inside MediaWiki, in the kubernetes cluster etc
- chart-renderer service runs inside kubernetes cluster
- Browser ecosystem issues could prevent Charts from displaying or operating correctly. But note issues with unsupported browsers should be filtered from Chart SLIs.
- Community-editable Lua modules, chart definitions or tabular data could prevent Charts from displaying or operating correctly; this should be excluded from Chart SLIs.
Service dependencies
Charts Service
Charts service itself takes input and produces output, it has no runtime dependencies other than the infrastructure to run it.
The service itself has dependencies on several upstream npm packages that are described in package.json.
When changing the major version of the ECharts package in the chart-renderer service, you will need to coordinate with also changing the entry in Extension:Chart and deploying the changes together (ideally in the same deploy window). Also see this ADR for guidance on how to perform this upgrade without breaking the interactive version of charts on the client. It's recommended that you reach out to the Chart extension maintainers and ask them for help, as upgrading ECharts to version that isn't backwards compatible takes some time and effort.
Parsoid
Charts relies on fragments support in Parsoid. This is now enabled everywhere, but at the time of writing was only used by Charts and Wikifunctions. If something breaks with respect to how Charts appear on wiki pages, this might be due to a bug in Parsoid that would not be noticed by most other types of content.
MediaWiki dependencies
Extension:Chart
Extension:Chart has a firm dependency on Extension:JsonConfig, which provides the infrastructure for JSON Data: pages. Chart definitions are stored on Commons, which means we also have a firm dependency on the s4 database section.
There is a runtime dependency on the chart-renderer service, which must be running to produce formatted output.
The Chart extension also contains JavaScript code that rerenders charts client-side, in the user's browser. Each chart is initially rendered server-side by the chart-renderer service, and sent to the client as an SVG, along with a JSON blob detailing the ECharts configuration that was used to render the chart. The client-side code waits until the user scrolls the chart into view, then loads ECharts and asks it to rerender the chart in the browser, replacing the server rendering. This allows our users to benefit from additional interactivity features in ECharts.
Extension:JsonConfig
Extension:JsonConfig relies on Extension:Scribunto to provide Lua integration features.
Extension:Scribunto
Scribunto provides low-level Lua integration in MediaWiki and is required for the transform feature in JsonConfig and Chart. Scribunto has a runtime dependency on the Lua interpreter.
Client-facing
Users
Feature users:
- Readers
- Editors who add charts to articles, editors who create new chart definitions
- Editors who create new tabular data sets for use in charts
Clients
chart-renderer's client is Extension:Chart running in MediaWiki. The extension feeds the service a JSON blob of chart format and tabular data and gets back a blob of HTML/SVG.
This return data will be incorporated into HTML output, so the service should be considered a security surface.
Request Classes
Rendering a chart from parser cache is different from rendering a chart directly from the chart service. An edit to a chart or a page with a chart will trigger a request to the chart-renderer service. When another user then views the page/chart and the page/chart is in the parser cache, it will be served from the parser cache without hitting the chart-renderer service.
Service Level Indicators (SLIs)
- chart-renderer service: Combined latency-availability SLI: The percentage of all requests that complete within 200 milliseconds and receive a non-error response (HTTP 200).
- Chart client side rendering: Availability: The percentage of client-side rendering attempts that successfully display a chart.
Operational
Monitoring
The microservice receives requests for /_info from a blackbox prober, which will page SRE if the service is unreachable or otherwise not responding.
Failed client-side render attempts are logged in Logstash. The Chart extension also instruments the number of client-side render attempts, the number of successes, and the duration of each successful render. This data is displayed on this dashboard.
Troubleshooting
The chart-renderer service is stateless and can be run in isolation for single requests, making it relatively simple to debug in a pinch.
Deployment
Deployment updates are through standard kubernetes service deployment with helm charts, pulling an image from Gitlab continuous integration.
Service Level Objectives
Realistic targets
- chart-renderer service: 90% of requests succeed within 200ms. We currently achieve this most of the time, with only a few outliers likely caused by low overall traffic (less than 1 req/s)
- Chart client side rendering: 99.5% success rate (0.5% error rate). The error rate measured over a 24-hour period has never yet exceeded 0.2%; over shorter timeframes it's spikier, sometimes spiking a little over 1%, but the p95 is about 0.45%. Targeting an overall error rate no higher than 0.5% should give us some headroom for incidents or unreliable clients.
Ideal targets
- chart-renderer service: 95% of requests succeed within 200ms. High latencies would increase page save times.
- Chart client side rendering: A 99.5% success rate should be more than acceptable to end users, since there's a non-interactive fallback if the render fails
Reconciliation
- chart-renderer service: 90% of requests succeeding within 200ms is acceptable, since 90th percentile page save times are currently about 2000ms
- Chart client side rendering: >=99.5% success rate (<=0.5% failure rate)