Test Kitchen/Decision Records/Increase action context limit
Status: Decided, 1 Dec 2025
Author: Julie van der Hoop
Deciders: Julie van der Hoop, Sam Smith, Clare Ming, Mikhail Popov
Consulted: Product Analytics, Morten Warncke-Wang, Tran Tran, Dmitry Brant, Seddon
Date authored: 2025-11-19
Phabricator: T398021
Background
action_context is a field to collect contextual metadata that describes the circumstances or environment in which a user action or event occurred. In Test Kitchen, it's a short (maximum 64 characters) human-readable phrase to provide meaningful context for the event. The length limit is meant to dissuade data stuffing via JSON-encoded strings into this field.
Data stuffing is when we try to cram excessive, unstructured, or poorly organized data into event properties, often as large JSON blobs, rather than following a clean schema. It's a common anti-pattern in analytics instrumentation.
Problem
We do however run into scenarios where teams run up against the limit:
- Thread with Apps (Wikimedia Slack)
- Thread about Moderator Tools's experiment (Wikimedia Slack)
- T408142: Enable recording of user Input for “Something else” incident category
- T398021: Consider increasing the length of action_context property in the base schemas
Decision
Increase the limit from 64 characters to 320. Request that folks come to us when this gets in their way, and reinforce best practices when it comes to data stuffing.
Good use of action_context
- Timed ticks for measuring session length:
action: tick, action_context= {"time_s":<INT>} - How many articles are in the reading list when they add another?
action: click, action_subtype: save_article_to_reading_list, action_context: {"article_count": <INT>} - When user visits their Watchlist, whether there are changes listed that the user can click on or if it's empty:
action: page_visit, action_context: {"has_changes":true}
Bad use of action_context
- Using it as a dumping ground for what might be useful, one day:
{ "action": "click", "action_source": "get_support", "action_context": { "page_name": "Special:Preferences", "scroll_position": 342, "mouse_x": 1024, "mouse_y": 768, "time_on_page": 47, "previous_5_pages": [...], "browser_memory_usage": "245MB", "cpu_usage": "23%", "network_speed": "4G" } }
Why is data stuffing bad?
When people stuff action_context with huge JSON objects:
- Privacy and data collection risks: Most important to WMF, it is too easy to accidentally log PII or sensitive data through data stuffing – we are often not thinking about the contextual attributes and their combinations which can interact to increase data collection risk.
- Querying becomes painful: Deeply nested JSON is more challenging to analyze, and most analytics databases (and analysts) aren't optimized for deeply nested JSON. The result is verbose, complicated queries full of JSON extraction functions – e.g.
CAST(GET_JSON_OBJECT(action_context, '$.timing_ms') AS INT)– which makes analysis slow and error-prone. - Performance degradation: Large payloads increase network overhead, storage costs, and processing time. If every event carries KB of context data, infrastructure costs explode.
- Analysis paralysis: Our analysis is designed to be automated, so query pain listed above is relevant. If we do need manual analysis, Analysts don't always know what data exists or where to find it. Documentation becomes impossible when the schema is "just throw it in here."
Instead of data stuffing:
- Be intentional about what context actually matters for analysis. More data isn't better if it's not actionable.
- Define a clear schema: Specify exactly which context fields are allowed and their data types. Schema versioning is possible, and we can expand a schema to fit specific needs (eg, for Apps)
- Flatten strategically: Use top-level properties for commonly queried dimensions rather than nesting everything
- Use references: Store large contextual data separately and reference it by ID