EventStreams/Administration

From Wikitech
Jump to: navigation, search

See EventStreams for an overview of the EventStreams service.

EventStreams is a service-template-node based service. It glues together KafkaSSE with common Wikimedia service features, like logging, error reporting, metrics, configuration and deployment.

Internally, EventStreams is available at eventstreams.svc.${::site}.wmnet. It is routed to by varnish and LVS from stream.wikimedia.org. RCStream is also served by this domain. By default, varnish is configured to route all requests to EventStreams, except for those that match RCStream routes. EventStreams deprecates RCStream, and RCStream will be decommissioned in the future.

Configuration

EventStreams is configured in puppet via role::eventstreams::* hiera variables. The production configuration is in the hiera scb role.

role::eventstreams::streams maps stream routes to composite topics in Kafka. Our event topics are prefixed by datacenter name. This is abstracted for EventStreams consumers via this mapping. Any combination of stream name -> composite topic list is possible, e.g.

role::eventstreams::streams:
  recentchange:
    topics:
      - eqiad.mediawiki.recentchange
      - codfw.mediawiki.recentchange

Kafka

Currently EventStreams is backed by the analytics-eqiad Kafka cluster. 'analytics' is no longer a descriptive name for this Kafka cluster. It is a historical artifact. We plan to move this cluster to a new 'aggregate' Kafka cluster outside of the Analytics VLAN.

EventStreams is not backed by the main Kafka clusters, simply because those are weaker clusters designed for high priority but low volume use. Because the analytics-eqiad Kafka Cluster only exists in eqiad, this does mean that the codfw deployment of EventStreams consumes messages cross DC.

Kafka MirrorMaker

Most events exposed by EventStreams do not originate in the analytics-eqiad Kafka cluster. The primary events are produced to the main Kafka clusters, and then instances of Kafka MirrorMaker copy those events into the analytics-eqiad cluster.

NodeJS Kafka Client

KafkaSSE uses node-rdkafka (as do other production NodeJS services that use Kafka).

Repositories

Repository Description
KafkaSSE (github) Generic Kafka Consumer -> SSE NodeJS library.
eventstreams (github) EventStreams implementation using KafkaSSE and service-template-node.
eventstreams/deploy Deploy repository for EventStreams, contains scap3 config and node dependencies.

Deployment

EventStreams is deployed by Scap3 to the scb production service cluster. In deployment-prep, EventStreams is deployed to sca hosts.

Submitting changes

Change to KafkaSSE library

KafkaSSE is hosted in diffusion, so you must use arc / differential to submit patches. Follow the instructions at https://www.mediawiki.org/wiki/Phabricator/Arcanist#Using_arcanist to submit, review, and finally merge a patch.

kafka-sse is an npm dependency of EventStreams.

Change to eventstreams repository

EventStreams is hosted in gerrit. Use git review to submit patches.

Update eventstreams/deploy repository

Once you've made changes to either of the above to repositories, you'll need to rebuild the eventstreams/deploy repository. The easiest way to do this is to use service-runner's docker builder. Follow the instructions at https://www.mediawiki.org/wiki/ServiceTemplateNode/Deployment#Local_git to do so.

Deploy

Ssh to the deploy server and run the following instructions to deploy the latest commit in the eventstreams/deploy repository.

ssh deployment.eqiad.wmnet # or deployment-tin.deployment-prep.eqiad.wmflabs
cd /srv/deployment/eventstreams/deploy
git pull && git submodule update
scap deploy

Logs

Logs are output to disk on target hosts in /srv/log/eventstreams/.

Metrics

https://grafana.wikimedia.org/dashboard/db/eventstreams

Incidents