Analytics/Systems/EventLogging/Architecture

From Wikitech


This page explains WMF's EventLogging system topology and how its parts interact. Using the following diagram as a reference:

EventLogging architecture

  • varnishkafka sends client-side raw (URL encoded JSON in query string) events from Varnish to eventlogging-client-side Kafka topic.
  • An eventlogging-processor consumes and processes these raw events and send them back to Kafka as JSON strings. Once processed and validated, the processed events are produce to Kafka in the topics: eventlogging-valid-mixed and eventlogging_<schemaName>. eventlogging-valid-mixed that contains the valid events from all schemas with the exception of blacklisted high volume schemas. eventlogging_<schemaName> holds all events for each schema.
  • eventlogging-valid-mixed is consumed by eventlogging-consumer processes and stored into MySQL and into the eventlogging log files. The eventlogging_<schemaName> topics are consumed by Camus and stored in HDFS partitioned by <schemaName>/<year>/<month>/<day>/<hour>

The EventLogging back-end is comprised of several pieces that consume and produce from/to Kafka, which makes it a single purpose standalone stream processor. The /etc/eventlogging.d file hierarchy contains those process instance definitions. It has a subfolder for each service type. An systemd task, uses this file hierarchy and provisions a job for each instance definition. Instance definition files contain command-line arguments for the service program, one argument per line.

An 'eventloggingctl' shell script provides a convenient wrapper around for managing EventLogging processes.