The Wikimedia Event Platform refers to the various event stream distribution and processing systems we employ at the Wikimedia foundation. Several components make up the event platform, including 'legacy' and 'modern'. This page documents these different components and provides some historical context and also outlines future work and intentions.
More documentation can be found on component specific pages:
- Event Platform/Instrumentation How To - How to use Event Platform to instrument and collect analytics event data
- Event Platform/EventGate - Documentation about EventGate event intake service.
- Event Platform/EventStreams & Event Platform/EventStreams/Administration - Documentation about the public event publishing service.
- Analytics/Systems/EventLogging & Analytics/Systems/EventLogging/Administration - Documentation about the original EventLogging Analytics event intake and distribution systems.
- Event Platform/Schemas - Details about event schema repositories how they are used.
- Event Platform/Schemas/Guidelines - Event schema rules and conventions.
- Modern Event Platform Phabricator Parent Task
- Event* - Event Platform/Service disambiguation page.
Platform Architecture Summary
- Event - A strongly typed and schemaed piece of data, usually representing something happening at a definite time. E.g. revision-create, user-button-click, page-load, etc.
- Stream - A contiguous (often unending) collection of events (loosely) ordered by time.
The first 'event platform' at WMF was EventLogging. This system originally used ZeroMQ to transport messages between its various 'services' but was later improved to use Kafka. It was built for WMF product teams to be able to instrument and measure WMF features and usage on websites and apps. It used a (hardcoded) on-wiki event 'schema repository' to validate incoming events.
In 2015, an effort was made to extend the analytics focus of EventLogging to use for production events. This effort was dubbed 'EventBus' and culminated in three new components: the Mediawiki EventBus extension, the mediawiki/event-schemas git repository, and eventlogging-service-eventbus. eventlogging-service-eventbus is the first HTTP POST (internal) endpoint. It validated and produced events against more tightly controlled production schemas, and produced them to Kafka. EventBus was used to build the Change Propagation service. We originally intended to merge the analytics vs. production uses of EventLogging.
In 2018, we started the Modern Event Platform program, which included EventBus's original analytics+production unification goal as well as other parts of WMF's event processing stack using open source (non-homegrown) components where possible. The EventLogging python codebase was too WMF and Mediawiki specific to easily accomplish the unification, so it was decided to build a new more generic and extensible JSONSchema event service, eventually entitled EventGate.
In 2019, EventGate, along with other Modern Event Platform components, replaced eventlogging-service-eventbus, and is intended to eventually replace as the 'analytics' deployment of EventLogging services (e.g. eventlogging-processor).