Analytics/Systems/EventLogging/Performance

From Wikitech

Put any performance related Event Logging information on this page.

Event Logging load test

Setup:

  • using analytics1004.eqiad.wmnet, a beefy if not trusty Cisco box (analytics1004.eqiad.wmnet as of this writing)
  • 31492 events, 4 schemas, split 3/3/3/1, generated by the wonderful test_load.py
  • format: seqId[space]?[qson event]

Processor

  • 4 runs average: 17.6478 seconds, so ~1784 events per second
  • command: time cat /home/milimetric/load.test.seq.30k | python ./eventlogging-processor "%{seqId}d %q" stdin:// file:///home/milimetric/out.load.30k

Processor (take 2)

  • using a patch from Andrew but that shouldn't affect performance for file reading (https://gerrit.wikimedia.org/r/#/c/221664/)
  • consuming from a file generated using kafkacat now, with the standard format expected in prod
  • 30000 events exactly, as sampled from the prod stream, so some of them are invalid (truncated, bad data, etc)
  • 4 runs average: 21.3743, so ~1403 events per second
  • command: time cat /home/milimetric/load.test.sample | python bin/eventlogging-processor "%q %{recvFrom}s %{seqId}d %t %h %{userAgent}i" stdin:// file:///home/milimetric/out.load.sample
  • It seems even small variations in format and number of invalid events affects performance quite a bit, so we should have a fairly large margin because we could get a spike of invalid events

Processor (take 3 - direct from kafka)

  • reading from the beginning of a kafka topic so we're not limited by amount of events
  • command: time python bin/eventlogging-processor "%q %{recvFrom}s %{seqId}d %t %h %{userAgent}i" "kafka:///analytics1012.eqiad.wmnet:9092?topic=eventlogging-client-side&auto_offset_reset=smallest" file:///home/milimetric/out.load.from.kafka
  • 40.229 seconds: 40467 events
  • 78.812 seconds: 78185 events
  • 54.090 seconds: 65253 events
  • Average: ~1062 events per second with a fairly high variance again suggesting we should give ourselves a large margin

Processor (take 4 - direct from kafka but with custom code to make all events invalid)

  • Idea: does the processing rate suffer if all events are invalid
  • 18.938 seconds, 13348 events, 704 per second

Processor (take 5 - all events invalid, no logging to slow down the console output)

  • Idea: does the logger output which goes to the console in theses tests slow down the processing?
  • 28.326 seconds, 76995 events, 2719 per second!!!

Processor (take 6)

  • using Andrew's fancy parallel processor
  • 1 process, 100k events: 71.439 seconds, 68.734
  • 2 processes, 100k events: 35.335 seconds, 35.794
  • 5 processes, 100k events: 15.269 seconds, 15.124
  • 7 processes, 100k events: 12.959 seconds, 12.772
  • 10 processes, 100k events: 11.291 seconds, 11.xxx
  • 12 processes, 100k events: 11.000 seconds, 10.831
  • 24 processes, 100k events: 9.713 seconds, 9.843

Consumer

  • NOTE: mysql server was installed on the same host as the eventlogging-consumer
  • NOTE: tried this with four different batch sizes (400, 1000, 4000, 8000) and it didn't seem to make a substantive difference. Therefore I just used the default configured batch size of 400. This indicates that most likely the performance bottleneck is in preparing and organizing the events for insertion and not the insertion itself.
  • 4 runs average: 11.8945 seconds, so ~2647 events per second
  • command: time cat ~/out.load.30k | python eventlogging-consumer stdin:// mysql://root:root@127.0.0.1/log?charset=utf8

Benchmarking DB inserts

When batching is turned on (i.e. events for distinct schemas are being grouped) this are some numbers we get in vanadium:

Inserting 300 events takes about 1 sec

Inserting 400 events takes less than 2 secs (~1.6)

Inserting 500 events takes about 2 secs

Inserting 600 events takes over 2 secs

Inserting 100/1500 events takes 3.5 secs