Data Platform/Systems/EventLogging/Performance

Put any performance related Event Logging information on this page.

Event Logging load test

Setup:

using analytics1004.eqiad.wmnet, a beefy if not trusty Cisco box (analytics1004.eqiad.wmnet as of this writing)
31492 events, 4 schemas, split 3/3/3/1, generated by the wonderful test_load.py
format: seqId[space]?[qson event]

Processor

4 runs average: 17.6478 seconds, so ~1784 events per second
command: time cat /home/milimetric/load.test.seq.30k | python ./eventlogging-processor "%{seqId}d %q" stdin:// file:///home/milimetric/out.load.30k

Processor (take 2)

using a patch from Andrew but that shouldn't affect performance for file reading (https://gerrit.wikimedia.org/r/#/c/221664/)
consuming from a file generated using kafkacat now, with the standard format expected in prod
30000 events exactly, as sampled from the prod stream, so some of them are invalid (truncated, bad data, etc)
4 runs average: 21.3743, so ~1403 events per second
command: time cat /home/milimetric/load.test.sample | python bin/eventlogging-processor "%q %{recvFrom}s %{seqId}d %t %h %{userAgent}i" stdin:// file:///home/milimetric/out.load.sample
It seems even small variations in format and number of invalid events affects performance quite a bit, so we should have a fairly large margin because we could get a spike of invalid events

Processor (take 3 - direct from kafka)

reading from the beginning of a kafka topic so we're not limited by amount of events
command: time python bin/eventlogging-processor "%q %{recvFrom}s %{seqId}d %t %h %{userAgent}i" "kafka:///analytics1012.eqiad.wmnet:9092?topic=eventlogging-client-side&auto_offset_reset=smallest" file:///home/milimetric/out.load.from.kafka
40.229 seconds: 40467 events
78.812 seconds: 78185 events
54.090 seconds: 65253 events
Average: ~1062 events per second with a fairly high variance again suggesting we should give ourselves a large margin

Processor (take 4 - direct from kafka but with custom code to make all events invalid)

Processor (take 5 - all events invalid, no logging to slow down the console output)

Idea: does the logger output which goes to the console in theses tests slow down the processing?
28.326 seconds, 76995 events, 2719 per second!!!

Processor (take 6)

Consumer

NOTE: mysql server was installed on the same host as the eventlogging-consumer
NOTE: tried this with four different batch sizes (400, 1000, 4000, 8000) and it didn't seem to make a substantive difference. Therefore I just used the default configured batch size of 400. This indicates that most likely the performance bottleneck is in preparing and organizing the events for insertion and not the insertion itself.
4 runs average: 11.8945 seconds, so ~2647 events per second
command: time cat ~/out.load.30k | python eventlogging-consumer stdin:// mysql://root:root@127.0.0.1/log?charset=utf8

When batching is turned on (i.e. events for distinct schemas are being grouped) this are some numbers we get in vanadium:

Inserting 300 events takes about 1 sec

Inserting 400 events takes less than 2 secs (~1.6)

Inserting 500 events takes about 2 secs

Inserting 600 events takes over 2 secs

Inserting 100/1500 events takes 3.5 secs