Jump to content

Fundraising Analytics/Impression Stats

From Wikitech

We now get stats via Kafka and some special fr-tech-ops script to reconstitute files that look like web logs, then a Django script processes them.

We WANT to switch over to FRUEC

The below seems to all be very outdated.


Banner impressions and landing page stats are collected from Squid logs via udp2log running on Locke. Every 15 minutes a cron job, running via file_mover@locke's crontab, rotates the log files to a local buffer directory where they're retained for 7 days. The script also copies the files via nfs to the local NetApp nas1-a.pmtpa.wmnet, which is mirrored offsite to nas1001-a.eqiad.wmnet. Finally, the NetApps are also nfs-mounted to grosley/aluminium where files are parsed by analytics scripts.

Counting banner impressions is not fun. Currently, we send a beacon to Special:RecordImpression, or /beacon/RecordImpression... which includes GET parameters identifying the banner and campaign, selection criteria, and outcome, telling us whether the banner was hidden or shown.

These Varnish hits are sampled using udp-filter, configured here:

https://phabricator.wikimedia.org/diffusion/OPUP/browse/master/templates/udp2log/filters.erbium.erb;34ee0c968c8a61cf33f0885dd68b8cf49872b7c1

udp2log proxy log collection

udp2log is configured via two entries in locke:/etc/udp2log/squid:

# Landing pages
pipe 1 /a/squid/fundraising/lp-filter >> /a/squid/fundraising/logs/landingpages.log

# Banner Impressions
pipe 100 /a/squid/fundraising/bi-filter >> /a/squid/fundraising/logs/bannerImpressions-sampled100.log

To enable/disable, uncomment/comment these lines and then HUP udp2log:

awjrichards@locke:~$ /home/file_mover/scripts/resetudp2log 

proxy log rotation and archiving

Log rotation, compression, and copy to netapp is handled by a cron job running as user file_mover@locke:

# rotate and compress fundraising banner impression logs, and archive to netapp
*/15 * * * * /home/file_mover/scripts/rotate_fundraising_logs

analytics processing script

The sampled files are digested by the "Banner impressions loader" Jenkins job, using the following code from the DjangoBannerStats repo. The impression counts are aggregated into the "pgehres" database.

export PYTHONPATH=/etc/fundraising
python /srv/DjangoBannerStats/manage.py LoadLPImpressions --verbose --recent
python /srv/DjangoBannerStats/manage.py LoadBannerImpressions2Aggregate --verbose --top --recent

All this must be changed. The "banner history" feature and deterministic banner loading is meant to replace all of this, some time in 2015.

monitoring and debugging

The cron script logs verbosely and locke:/var/log/syslog will show you actions and errors.

Here's what's what in the following examples:

/a/squid/fundraising/logs/*log # active udp2log collection point /a/squid/fundraising/logs/buffer/2012/*.log # freshly rotated log, before compression /a/squid/fundraising/logs/fr_archive # netapp nfs mount (i.e. permanent archive location)

Under normal operation, you should see this sequence:

Sep  6 17:45:01 locke CRON[28592]: (file_mover) CMD (/home/file_mover/scripts/rotate_fundraising_logs)
Sep  6 17:45:01 locke rotate_fundraising_logs[28594]: move /a/squid/fundraising/logs/landingpages.log to /a/squid/fundraising/logs/buffer/2012/landingpages-20120906-174501.log
Sep  6 17:45:01 locke rotate_fundraising_logs[28594]: move /a/squid/fundraising/logs/bannerImpressions-sampled100.log to /a/squid/fundraising/logs/buffer/2012/bannerImpressions-sampled100-20120906-174501.log
Sep  6 17:45:01 locke rotate_fundraising_logs[28594]: reload udp2log
Sep  6 17:45:01 locke rotate_fundraising_logs[28594]: gzip /a/squid/fundraising/logs/buffer/2012/bannerImpressions-sampled100-20120906-174501.log
Sep  6 17:45:01 locke rotate_fundraising_logs[28594]: gzip /a/squid/fundraising/logs/buffer/2012/landingpages-20120906-174501.log
Sep  6 17:45:01 locke rotate_fundraising_logs[28594]: rsync -ar /a/squid/fundraising/logs/buffer/ /a/squid/fundraising/logs/fr_archive/
Sep  6 17:45:02 locke rotate_fundraising_logs[28594]: done!

Things to watch out for include:

  1. move/gzip errors due to local partition overrun, permissions snafu
  2. nfs mount inaccessible
  3. udp2log HUP fails