Analytics/Systems/EventLogging/Backfilling

From Wikitech
Jump to: navigation, search

New way

From eventlog1001.eqiad.wmnet, find a log file that you want to backfill from, and optionally zcat parts out of it (for example, by piping through head / tail: | tail -n +2000000 | head -n 10000). Let's say we picked /srv/log/eventlogging/archive/all-events.log-20151015.gz and wanted to backfill only the first 100 events:

ionice zcat /srv/log/eventlogging/archive/all-events.log-20151015.gz | head -n 100 | python eventlogging-consumer stdin:// 'mysql://eventlog:<<password found in /etc/eventlogging.d/consumers/mysql-m4-master>>@m4-master.eqiad.wmnet/log?charset=utf8&statsd_host=statsd.eqiad.wmnet&replace=True'


Loading a full file:

eventlogging@eventlog1001:/home/otto$ cat Translation_Beta_Events.json | python /srv/deployment/eventlogging/analytics/bin/eventlogging-consumer stdin:// 'mysql://xxx:xxx@m4-master.eqiad.wmnet/log?charset=utf8&statsd_host=statsd.eqiad.wmnet&replace=True'

This can be run from the eventlogging machine but also from 1002, if you are running this from 1002 you might need to set:

export PYTHONPATH=/srv/deployment/eventlogging/eventlogging
export DJANGO_SETTINGS_MODULE='THIS IS A BUG'

And qualify all paths, like:

/usr/bin/python /srv/deployment/eventlogging/eventlogging/bin/eventlogging-consumer @/home/some/consumer-config

Old way (kept as it may be useful)

This document explains how to backfill EL data as of 2015-02-16 . Please note that any redesigns of the service will affect how backfilling is done.

Prerequirements

You need sudo on Beta Cluster to test the backfilling scripts and also sudo on eventlog1001 to do the backfilling for real: EventLogging/Testting/BetaCluster

This document describes how to do backfilling from "processed" events. If you need to backfill from raw events, like the ones stored on the client side log additional steps are needed. The idea is the same only that a "process" step needs to be included so raw events can be processed before inserted on db.

Note that from this change onwards: [1] eventlog1001 only has logs for the last 30 days so backfilling of an outage should be done as soon as possible

First step (data preparation)

In the first step, split the logs for the relevant day into files of 64K lines. This size ensures you don't go over memory issues.

(Having such small files gives good control over what timespan you want to backfill, and it allows for easy parallelization, speed-up, and fine-control during data injection).

Events can be split with a command like this:

 mkdir split && cd split && ionice nice zcat /srv/log/eventlogging/archive/all-events.log-20141114.gz >all-events.log && ionice nice split --lines=64000 all-events.log && rm all-events.log


Raw Events

If you need to backfill raw events you might find this snippet useful: https://gist.github.com/nuria/e837d16b94c09a4df8a4 raw events logs (client-side and server-side) include a bunch of characters that need to be removed to be processed by the processors.

Second step (data injection)

You should test your scripts and code in Beta Cluster before trying this on vanadium.

Checkout a separate clone of EventLogging

The injection is better done using a separate clone of EventLogging. That way the backfilling is not subjected to interruptions of eventual EventLogging deployments of others, and you can use could use an EventLogging version of your choice.

See for example changes done prior to be able to backfill events 1 by 1 (not batched): [2]

To run EventLogging from your local checkout you need to change the python library search path. So, if you checked out EL code in your home directory, you would need to tell python where to build it:

cd ~/EventLogging/server
export  PYTHONPATH='/home/nuria/backfilling/python'
python ./setup.py develop --install-dir=/home/nuria/backfilling/python

These command build EL to `/home/nuria/backfilling/python`

Start a Backfilling Consumer

In a simple for loop over those split files (or parts of them in parallel), start a separate EventLogging consumer (that consumes from stdin and writes to m2-master) and pipe the file in. The config for this EventLogging consumer is just a copy of the m2 consumer's config having it's input swapped by the stdin. I would rename this config so when running htop is easy to find the process:

Config looks as follows:

nuria@vanadium:~/backfilling$ more mysql-m2-master-BACKFILLING
stdin://
mysql://some-connection-string?charset=utf8&replace=True

Note that the regular consumer batches events. Using that code as is to backfill is fine if you a are dealing with a total outage. If you have a problem with dropped out events within the event stream you cannot batch insertion. Thus , you might need to do code changes to the consumer to be able to backfill:

I had to do these changes on 201502: https://gerrit.wikimedia.org/r/#/c/190139/

To try whether your changes are working (again, in Beta Cluster)

/usr/bin/python -OO ./python/eventlogging-consumer @/home/nuria/backfilling/mysql-m2-master-BACKFILLING  > log-backfill.txt 2>&1

For each of the started consumers (I could only start two without the db falling too much behind), capture stdout and stderr and exit code to separate (per input) files. This allowed to easily verify that backfilling did not bail out and correlate log files with input files.


A simple shell script to loop over files and consume each:

 #!/bin/bash

 fileList=`ls 20150208/x*`

 for f in $fileList
 do
    l="${f##*/}"
    ionice nice cat $f | ionice nice  /usr/bin/python -OO ./python/eventlogging-consumer @/home/nuria/backfilling/mysql-m2-master-BACKFILLING  > log-backfill-${l}.txt 2>&1
    rm log-backfill-${l}.txt
 done

Monitoring

There are two things to monitor: the database and eventlogging hosts. You can monitor eventlogging hosts with htop, the database stats appear here: https://tendril.wikimedia.org/host/view/db1046.eqiad.wmnet/3306