Jump to content

Event Platform/SPIKE/Should we enable compression on kafka jumbo

From Wikitech


This page summarizes the learnings of https://phabricator.wikimedia.org/T345657

[SPIKE] Should we enable compression on kafka jumbo?


tl;dr: we should enable snappy compression at Flink producer level. Snappy compression is the default for events produced by EventGate,

and topics mirrored by MirrorMaker. Flink streaming applications bypass EventGate, but should guarantee configuration settings parity.

Evaluation

This spike wanted the inform the following:

What are key metrics on topics that store large messages (total size, percentile message size, throughput, retention)?

A rich set of topic level metrics are exposed in Grafana. Config settings (retention, partitioning) need to be programmatically

inspected on a per topic basis and are not available under version control.

Some rough numbers on 7 days of mediawiki.page_content_change.v1. The topic has a size in the neighborhood of 762GiB. Messages have a median size of 380KB and a max of 1MB.

Quartile breakdown:

stat                                         (bytes)
mean                                        4.140364e+05            
std                                         1.493413e+05            
min                                         1.370000e+01            
25%                                         3.171210e+05            
50%                                         3.879230e+05            
75%                                         4.627110e+05            
max                                         1.032399e+06

For comparison, mediawiki.page_change.v1 in the neighborhood of 20-24GiB and significantly smaller records:

stat                                               (bytes)
mean                                        11830.387352    
std                                          2519.479806    
min                                          5576.000000    
25%                                         10159.000000    
50%                                         11955.000000    
75%                                         13253.000000    
max                                         20642.000000

Page changes is produced from eventgate, so I assume the payload is snappy compressed.

Should we also enable compression on jumbo?

No need to. Topics are assumed to be compressed by producers (EventGate) and MirrorMaker.

Should we rather let producers take care of it?

This is the recommended path forward for producers that sidestep MirrorMaker.

Follow up work

The following work has been identified as actionable next steps: