Event Platform/Flaws

From Wikitech

Like any system, Event Platform has its flaws. This page aims to document them.

It has inherited design decisions that may have been good at the time, but with hindsight are not.

meta field

The meta field is very confusing. It was originally created as a way of referencing a single subobject field that contained fields EventBus events needed to operate. This allowed for easier copy/pasting the field between different schemas, which we had to do before we had jsonschema-tools and schema $refs and materialization.

Ideally, the fields in meta would be top level and named appropriately. If we could get rid of meta we would. Doing so would be a lot of work.

meta.domain field

The description of this field (as of 2024-02) is "Domain the event or entity pertains to".

The semantics of this field were never well defined. It is often used to hold a domain name the event pertains to, but it is also sometimes used as the 'business' domain. It is also used by the WMF canary (heartbeat) events system.

dt fields

Every event needs to have an 'event time' field, specifying the time at which the event happened. Ideally, this would be the only field we'd need to require for all events. We would then use this field for Kafka timestamps and Hive hourly partitioning.

However, we accept events from unauthenticated external clients, so we can't totally trust them. A client might send an event with a timestamp in the distant past or future, which would cause issues for data ingestion. In cases where we can't trust the event time, we fall back to using the server side receive time.

So we need 2 timestamp fields for ingestions, event time and server receive time. As of 2021-04, the intention is to always use meta.dt for server receive time and dt for event time, and then make which is used for ingestion configurable. These field names are not particularly descriptive, but creating and using new dt fields is a non trivial amount of work.