Jump to content

Event Platform/EventGate

From Wikitech

EventGate is an HTTP service for ingestion of events, written in Node.js. It takes JSON messages over HTTP POST requests, optionally validates them against a JSONSchema, and then produces them to a backend. The default backend (and the one used at WMF) is Kafka.

For more information about the EventGate codebase, see the README at gitlab.wikimedia.org/repos/data-engineering/eventgate. EventGate is meant to be generic and not WMF specific. It can be used standalone as a library, or it can be used with the built-in support for running Express (the Node.js HTTP server provided via mediawiki-service-template). WMF operates EventGate using the Express Node.js HTTP server.

This page documents the WMF deployments of EventGate.

Purpose

Event Platform seeks to standardize the process of producing and consuming event streams at WMF.

EventGate is a an HTTP proxy for producing events that follows our Event Platform producer requirements.

Service

Main article: Event Platform/EventGate/Administration

Implementation

EventGate code is primarily hosted on Gitlab. It was formerly hosted on Github for greater exposure to non-Wikimedia developers. There exists plenty of tooling around using Kafka with Avro but not much for Kafka with JSONSchemas. By hosting on Github we hoped to gain more visibility and participation from non-WMF developers.

The eventgate-wikimedia repository contains the WMF deployment of EventGate. It uses the eventgate package as an npm dependency, and has additional utilities, configuration and deployment pipeline code for WMF's instances of EventGate.

To modify the behavior of the eventgate library or service, use EventGate directly instead, from https://gitlab.wikimedia.org/repos/data-engineering/eventgate.

Operation

There are multiple clusters of EventGate services in Wikimedia production.

EventGate is hosted in production as a containerized service running on Kubernetes, as such there are no Puppet roles or persistent backend hostnames.

As of Feb 2020, there are currently the following clusters (per deployment-charts):

  • eventgate-main
  • eventgate-analytics
  • eventgate-analytics-external
  • eventgate-logging-external

More details on how are used further down below.

Wikimedia EventGate

Wikimedia's EventGate wrapper implements custom validate and produce behaviours for our use in WMF production. This includes configuration to look up JSON Schemas from either a the local filesystem (for eventgate-main) or a remote schema registry URL (for eventgate-analytics).

EventGate expects that specific implementations know how to map from an individual event to its JSONSchema. We use the $schema field in each of our JSON events to do this. This field contains a relative and versioned URI to the event's JSONSchema. EventGate fetches and caches this schema and uses it to validate each event with the same $schema.

See also Hadoop Event Ingestion Lifecycle.

Producer types: Guaranteed and Hasty

Wikimedia's EventGate configuration offers two different Kafka producer connections, named guaranteed and hasty.

The guaranteed producer will block the HTTP response until the event has been validated and sent to the Kafka brokers with either an ACK response or known failure to ACK. Note that "guaranteed" does not mean that the event is guaranteed to be persisted in Kafka (there is not an indefinite retry or other buffer). Rather it means that HTTP response status can be trusted, so a 2xx status code guarantees that the event has been persisted.

The hasty producer is optimised for high-throughput, and will not block the HTTP response. Instead, it will immediately return a 202 status response as soon as EventGate has received the JSON message from the HTTP response body. The event will be validated and produced to Kafka afterward. The HTTP client that submitted the event will not know whether the event was valid or whether it was succesfully persisted in Kafka. If the event failed validation or failed production to Kafka, an error will be logged to Logstash however.

The guaranteed producer type is the default for the /v1/events endpoint. To POST an event in hasty mode, set hasty=true in the request query parameters.

EventGate clusters

At WMF, EventGate is deployed as multiple separate clusters, each with its own defined purpose.

All EventGate clusters are open to receive events from any internal production server. MediaWiki produces to EventGate using the EventBus extension. (Apologies if this is confusing! See Event* for a disambiguation page for related terms.)

eventgate-main

  • Visibility: internal, submissions restricted.
  • Schemas: bundled, from local filesystem AND dynamic, via via https://schema.wikimedia.org
  • stream config only requested on service startup

The eventgate-main cluster produces events to the Kafka "main" clusters in both Eqiad and Codfw. It is used for low(ish) volume but high-priority events. These events are necessary for functioning of Wikimedia core services, like the MediaWiki Job Queue and change-propagation.

Events are submitted here by:

eventgate-analytics

  • Visibility: internal, submissions restricted.
  • Schemas: bundled, from local filesystem AND dynamic, via via https://schema.wikimedia.org
  • stream config only requested on service startup

The eventgate-analytics cluster produces events to the Kafka "jumbo" cluster, and is intended for high volume but low-priority events. Events produced to eventgate-analytics should not be required for functional production services. The Kafka jumbo cluster only exists in Eqiad, and does not have cross-dc replication and no cross-dc failover mode. Events originating from Codfw are produced directly to Kafka "jumbo-eqiad".

Events are submitted here by:

eventgate-analytics-external

The eventgate-analytics-external cluster produces events to the Kafka jumbo cluster. This replaces EventLogging Analytics, and can receive and validate events from external clients (like the EventLogging service before it).

Events are submitted here by:

eventgate-logging-external

The eventgate-logging produces events to the Kafka "logging" cluster. eventgate-logging-external accepts mediawiki/client/error events from external clients.

Events are submitted here by:

Event Stream Config

EventGate in production will request stream configuration from the EventStreamConfig MediaWiki API. Each service cluster restricts the stream configuration it uses via the destination_event_service setting; only streams that have destination_event_service matching the EventGate service cluster name (e.g. eventgate-main) will be used by that EventGate service cluster.

See also: Event_Platform/Stream_Configuration

Validation Errors

All EventGate clusters at WMF are configured to send validation error events to Logstash. These errors are routed via Kafka and then ingested into Hive into the various event.eventgate_*_error_validation tables but also ingested into Logstash.

Logstash dashboard: eventgate-validation

Grafana EventGate dashboard

Local development

eventgate-wikimedia-dev.js

The eventgate-wikimedia codebase comes with an EventGate development implementation in eventgate-wikimedia-dev.js. To use it, npm install --no-optional and then run ./eventgate-wikimedia-dev.js. Its default EventGate config is in ./config.dev.yaml.