Event Platform/Decision Log

From Wikitech
ID Category Decision Description Alternatives Considered Responsible Parties Decision Date Comments
#000 Mediawiki Event Carried State Transfer How to store MediaWiki state, including revisions content body, in events. Andrew Otto, Giuseppe Lavagetto 2022-03 https://www.mediawiki.org/wiki/Technical_decision_making/Decision_records/T291120#2022-03-01:_Otto_and_Giuseppe_discussion
#001 Dumps Flink is not ideal when writing to Iceberg in our environment Spark [Streaming] Dan Andreescu 2022-11-21 In trying to write a proof of concept job, we met with too many Java dependency problems. Building a maven pom with Flink, Iceberg, Kafka, and jars we need for our Hadoop environment runs into duplications and version mismatches. It's basically jar hell that may not be worth untangling. It's possible that in the near future the Iceberg team will solve this problem for us. It's also the case that Spark (Streaming or plain) seems to be working quite well for this use case.
#002 Event Streams: PyFlink Event Platform will support the API abstractions. Pyflink will be the preferred method to guide people to. Still allow people to use Java (if experienced)

Focus on abstractions of Python so you don't need to know Flink (which you would do for Java/Scala)

Andrew Otto, Gabriele Modena, Luke Bowmaker, Thomas Chin 2022-12-7
#003 Event streams: FlinkSQL We don't support it directly (not in .sql file) but sql can be embedded in Python code Andrew Otto, Gabriele Modena, Luke Bowmaker, Thomas Chin 2022-12-7
#004 Event Streams: Java/Scala We support it for experienced Flink engineers for more complex jobs (windowing, state, etc.) Andrew Otto, Gabriele Modena, Luke Bowmaker, Thomas Chin 2022-12-7
#005 Page Change schema model We will not overly flatten schemas for the sake of SQL querying. Andrew Otto, Gabriele Modena, Dan Andreescu 2023-01 T308017#8495502
#006 Page Change schema model We will not deprecate some meta fields now. Andrew Otto, Gabriele Modena, Dan Andreescu 2023-1 T308017#8549053
#007 Mediawiki Event Carried State Transfer streaming repo name mediawiki-event-enrichment https://etherpad.wikimedia.org/p/stream-enrichment-name-bikeshed Andrew Otto, Gabriele Modena, Dan Andreescu, Thomas Chin, Luke Bowmaker 2023-01
#008 Event Error streams Our convention for error event stream names is <job_name>.error, where job_name is a required setting for all jobs. E.g. mediawiki_page_content_change_enrichment is the job_name Andrew Otto, Gabriele Modena 2023-04 T326536
#009 Stream Major versioning Versioned streams should be declared with a major version suffix, like: mediawiki.page_change.v1 . Versioning streams is an opt in manual convention. Documented here: Stream_Configuration#Stream_versioning Andrew Otto, Gabriele Modena 2023-05 T332212
#010 User Entity Event Schema - user type booleans MediaWiki inconsistently models 'user types', e.g. temp, registered/anonymous, system, imported, etc. MediaWiki will one day refactor its own data model here, but we don't yet know what this data model will be. For now, we will represent these with boolean fields in the user entity data model, instead of trying to be future proof with a more flexible 'user types' field. Andrew Otto, Gabriele Modena, Dan Andreescu 2023-05 T336506#8855469
#011 Flink HA metadata storage in zookeeper Using Zookeeper to store Flink HA metadata will avoid manual steps from Service Ops during kubernetes maintenance. The Zookeper cluster is owned by Service Ops. There is no formal onboarding process, but we have been given an explicit OK to use it. Andrew Otto, Gabriele Modena 2023-05 T331283#8858396
#012 Flink HA metadata storage in k8s ConfigMaps, until Zookeeper is available As of 2023-07, Our deployed version of Zookeeper is too old to use for Flink HA. We will use k8s ConfigMaps instead. Supersedes #011 until we can use Zookeeper. Andrew Otto, Gabriele Modena 2023-06 T338233
#013 Kafka message compression Kafka topics produced by Flink applications should be snappy compressed. This happens by default with EventGate producers, that native Flink applications bypass. https://wikimedia.slack.com/archives/C03TA2WKGLB/p1693940979971639 Gabriele Modena, Luca Toscano, David Causse 2023-09 T345657

(Note: this doc was moved from a google sheet on 2023-06-27)