Event Platform/SPIKE/Should we enable compression on kafka jumbo
This page summarizes the learnings of https://phabricator.wikimedia.org/T345657
[SPIKE] Should we enable compression on kafka jumbo?
- Author: Gabriele Modena <gmodena@wikimedia.org>
- Bug: https://phabricator.wikimedia.org/T345657
tl;dr: we should enable snappy compression at Flink producer level. Snappy compression is the default for events produced by EventGate,
and topics mirrored by MirrorMaker. Flink streaming applications bypass EventGate, but should guarantee configuration settings parity.
Evaluation
This spike wanted the inform the following:
What are key metrics on topics that store large messages (total size, percentile message size, throughput, retention)?
A rich set of topic level metrics are exposed in Grafana. Config settings (retention, partitioning) need to be programmatically
inspected on a per topic basis and are not available under version control.
Some rough numbers on 7 days of mediawiki.page_content_change.v1. The topic has a size in the neighborhood of 762GiB. Messages have a median size of 380KB and a max of 1MB.
Quartile breakdown:
stat (bytes) mean 4.140364e+05 std 1.493413e+05 min 1.370000e+01 25% 3.171210e+05 50% 3.879230e+05 75% 4.627110e+05 max 1.032399e+06
For comparison, mediawiki.page_change.v1 in the neighborhood of 20-24GiB and significantly smaller records:
stat (bytes) mean 11830.387352 std 2519.479806 min 5576.000000 25% 10159.000000 50% 11955.000000 75% 13253.000000 max 20642.000000
Page changes is produced from eventgate, so I assume the payload is snappy compressed.
Should we also enable compression on jumbo?
No need to. Topics are assumed to be compressed by producers (EventGate) and MirrorMaker.
Should we rather let producers take care of it?
This is the recommended path forward for producers that sidestep MirrorMaker.
Follow up work
The following work has been identified as actionable next steps: