Event Platform

From Wikitech
Jump to navigation Jump to search

The Wikimedia Event Platform refers to the various event stream distribution and processing systems we employ at the Wikimedia foundation. There are various components that make up the event platform, including 'legacy' and 'modern'. This page documents these different components and provides some historical context and also outlines future work and intentions.


More documentation can be found on component specific pages:

Platform Architecture Summary

TODO

Historical Overview

Some terms:

  • Event - A strongly typed and schemaed piece of data, usually representing something happening at a definite time. E.g. revision-create, user-button-click, page-load, etc.
  • Stream - A contiguous (often unending) collection of events (loosely) ordered by time.

The first 'event platform' at WMF was EventLogging. This system originally used ZeroMQ to transport messages between its various 'services' but was later improved to use Kafka. It was built for WMF product teams to be able to instrument and measure WMF features and usage on websites and apps. It used a (hardcoded) on-wiki event 'schema repository' to validate incoming events.

In 2015, an effort was made to extend the analytics focus of EventLogging to use for production events. This effort was dubbed 'EventBus' and culminated in three new components: the Mediawiki EventBus extension, the mediawiki/event-schemas git repository, and eventlogging-service-eventbus. eventlogging-service-eventbus is the first HTTP POST (internal) endpoint. It validated and produced events against more tightly controlled production schemas, and produced them to Kafka. EventBus was used to build the Change Propagation service. We originally intended to merge the analytics vs. production uses of EventLogging.

In 2018, we started the Modern Event Platform program, which included EventBus's original analytics+production unification goal as well as other parts of WMF's event processing stack using open source (non-homegrown) components where possible. The EventLogging python codebase was too WMF and Mediawiki specific to easily accomplish the unification, so it was decided to build a new more generic and extensible JSONSchema event service, eventually entitled EventGate.

In 2019, EventGate, along with other Modern Event Platform components, are intended to replace both eventlogging-service-eventbus, as well as the 'analytics' deployment of EventLogging services (e.g. eventlogging-processor).