Analytics/Systems/Piwik

From Wikitech
Jump to navigation Jump to search

Our production piwik instance can be reached at https://piwik.wikimedia.org. it has two layer authentication, a first one with ldap credentials and another one with piwik user/password. To get the password talk to Dan Andreescu (milimetric on IRC)

From June 28th 2018 onward, Piwik is now called Matomo, but the way to reach it is the same (website, credentials, etc..).

Administration

Invalidate old reports

It happened in the past that the daily archiver cron (responsible to generate daily/monthly/yearly stats for any Piwik domain) skipped days of data, ending up in reports like https://phabricator.wikimedia.org/T188559 (data collected but not archived, so flat graphs). There is a quick way to force Piwik to rerun its archival process over past data, namely invalidating it:

elukey@bohrium:/var/log/piwik$ for el in {20..28}; do sudo -u www-data /usr/share/piwik/console core:invalidate-report-data --dates=2018-02-$el --sites=3; done
Invalidating day periods in 2018-02-20 [segment = ]...
Invalidating week periods in 2018-02-20 [segment = ]...
Invalidating month periods in 2018-02-20 [segment = ]...
Invalidating year periods in 2018-02-20 [segment = ]...
Invalidating day periods in 2018-02-21 [segment = ]...
Invalidating week periods in 2018-02-21 [segment = ]...
Invalidating month periods in 2018-02-21 [segment = ]...
Invalidating year periods in 2018-02-21 [segment = ]...
Invalidating day periods in 2018-02-22 [segment = ]...
[..cut..]

In this example data from 20/02/2018 to 28/02/2018 has been invalidated via Piwik's console for website id 3 (currently iOS).

Tuning

We had an expected performance problem while tracking a larger website, which we fixed with their adviceː http://piwik.org/docs/setup-auto-archiving/

The cron we set up with this technique is:

root@bohrium:/var/log/apache2# crontab -u www-data -l
# HEADER: This file was autogenerated at 2017-05-05 12:01:17 +0000 by puppet.
# HEADER: While it can still be managed manually, it is definitely not recommended.
# HEADER: Note particularly that the comments starting with 'Puppet Name' should
# HEADER: not be deleted, as doing so could cause duplicate cron jobs.
# Puppet Name: piwik_archiver
MAILTO=analytics-alerts@wikimedia.org
0 8 * * * [ -e /usr/share/piwik/console ] && [ -x /usr/bin/php ] && nice /usr/bin/php /usr/share/piwik/console core:archive --url="piwik.wikimedia.org" >> /var/log/piwik/piwik-archive.log

Known outages

  • Nov 23rd 2017: due to a Ganeti failure (more details in https://phabricator.wikimedia.org/T181121) the bohrium virtual machine (running Piwik and its mysql database) got stopped in a non graceful way, ending up in a InnoDB table corruption. We had to restore the last mysql backup happened on Nov 22, so almost all the data related to Nov 23 has not been recorded.
  • June 27th 2018: 8 hours of downtime to upgrade the Piwik database to the new version - T192298
  • June 28th 2018: 1 hour of downtime to upgrade Piwik to Matomo