Data Engineering/Systems/Matomo

From Wikitech
(Redirected from Analytics/piwik)

Matomo (formerly known as Piwik) is a web analytics platform which we use for microsites (roughly 10,000 requests per day or less). Our production instance can be reached at https://piwik.wikimedia.org. it has two layer authentication, a first one with LDAP credentials and another one with a Matomo specific user/password.

Access

To access Matomo, you need wmf or nda LDAP access. For more details, see Analytics/Data access#LDAP access.

If you have that access, you can log in at piwik.wikimedia.org with your Wikitech username and password.

After the LDAP login, there is a second login form that we don't need but cannot easily remove. To log in, use the username design and password design.

How to instrument

Piwik does some tracking out of the box like counting pageviews and unique devices, you can instrument further using piwik's (now called Matomo) API.

To configure Matomo to work with a Content Security Policy (CSP), see the docs about adding script-src, connect-src, and img-src to the CSP header.

Administration

When team requests a piwik beacon

  • Go to piwik and login with admin user
  • Click Settings
  • Websites -> Manage -> Add Site

Adding a site will create some tracking code like:

<script type="text/javascript">
var _paq = _paq || [];
/* tracker methods like "setCustomDimension" should be called before "trackPageView" */
_paq.push(['trackPageView']);
_paq.push(['enableLinkTracking']);
(function() {
 var u="//piwik.wikimedia.org/";
 _paq.push(['setTrackerUrl', u+'piwik.php']);
 _paq.push(['setSiteId', '19']);
 var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0];
 g.type='text/javascript'; g.async=true; g.defer=true; g.src=u+'piwik.js'; s.parentNode.insertBefore(g,s);
})();
</script>
  • Enable piwik user to see the site (password in stat1007 on /home/nuria, this is a regular user, not an admin one)
  • Done

Once snippet is in place visits will come in (reports are run once a day)

Rerun reports for all websites

php /usr/share/matomo/console core:archive --force-all-websites --force-all-periods=86400 (for websites that had visits in the last day)

Invalidate old reports

It happened in the past that the daily archiver cron (responsible to generate daily/monthly/yearly stats for any Piwik domain) skipped days of data, ending up in reports like https://phabricator.wikimedia.org/T188559 (data collected but not archived, so flat graphs). There is a quick way to force Piwik to rerun its archival process over past data, namely invalidating it:

elukey@bohrium:/var/log/matomo for el in {20..28}; do sudo -u www-data /usr/share/matomo/console core:invalidate-report-data --dates=2018-02-$el --sites=3; done
Invalidating day periods in 2018-02-20 [segment = ]...
Invalidating week periods in 2018-02-20 [segment = ]...
Invalidating month periods in 2018-02-20 [segment = ]...
Invalidating year periods in 2018-02-20 [segment = ]...
Invalidating day periods in 2018-02-21 [segment = ]...
Invalidating week periods in 2018-02-21 [segment = ]...
Invalidating month periods in 2018-02-21 [segment = ]...
Invalidating year periods in 2018-02-21 [segment = ]...
Invalidating day periods in 2018-02-22 [segment = ]...
[..cut..]

In this example data from 20/02/2018 to 28/02/2018 has been invalidated via Piwik's console for website id 3 (currently iOS).

Tuning

We had an expected performance problem while tracking a larger website, which we fixed with their adviceː http://piwik.org/docs/setup-auto-archiving/

The cron we set up with this technique is:

root@matomo1001:/var/log/apache2# crontab -u www-data -l
# HEADER: This file was autogenerated at 2017-05-05 12:01:17 +0000 by puppet.
# HEADER: While it can still be managed manually, it is definitely not recommended.
# HEADER: Note particularly that the comments starting with 'Puppet Name' should
# HEADER: not be deleted, as doing so could cause duplicate cron jobs.
# Puppet Name: piwik_archiver
MAILTO=analytics-alerts@wikimedia.org
0 8 * * * [ -e /usr/share/matomo/console ] && [ -x /usr/bin/php ] && nice /usr/bin/php /usr/share/matomo/console core:archive --url="piwik.wikimedia.org" >> /var/log/matomo/matomo-archive.log

Known outages

  • Nov 23rd 2017: due to a Ganeti failure (more details in https://phabricator.wikimedia.org/T181121) the bohrium virtual machine (running Piwik and its mysql database) got stopped in a non graceful way, ending up in a InnoDB table corruption. We had to restore the last mysql backup happened on Nov 22, so almost all the data related to Nov 23 has not been recorded.
  • June 27th 2018: 8 hours of downtime to upgrade the Piwik database to the new version - T192298
  • June 28th 2018: 1 hour of downtime to upgrade Piwik to Matomo
  • October 4th 2018: 1 hour of downtime to move Matomo/Piwik from bohrium to matomo1001 (new host).
  • October 5th 2018: 1 hour of downtime to fix a database issue.
  • December 5th 2018: 30 mins of downtime to upgrade to 3.7.0
  • May 14th 2020: 10 minutes of downtime to upgrade to 3.13.3
  • March 11 2021: 10 minutes of downtime to upgrade kernel to 4.19.171-2
  • May 4 2022: 10 minutes of downtime to upgrade kernel to 4.19.235-1

Upgrade Matomo

Example upgrades: https://phabricator.wikimedia.org/T252741 and the more recent https://phabricator.wikimedia.org/T275144

First of all, check the current version on the matomo host and see the changelog/diff from the latest upstream in https://matomo.org/changelog. Usually the things to check are (in every subpage related to a specific version):

  • if there are major database upgrades
  • if there are breaking changes (that may require a puppet change for the config etc..)
  • if there are security vulnerabilities

The last point is very important since the Matomo instance is exposed to the public Internet. Usually if there are no major database upgrades it should be safe to upgrade directly in production, otherwise please test in labs first.

Procedure to upgrade:

NOTE: Matomo has stopped publishing debs; see http://debian.matomo.org/ and https://github.com/matomo-org/matomo-package/issues/131

Eventually we will have to package it ourselves; for now, we are using the latest published .deb, 3.14.1-2.

  • go to apt1001 and execute reprepro --noskipold --ignore=forbiddenchar --component thirdparty/matomo checkupdate buster-wikimedia (or change to different distribution if the host has been upgraded)
  • check if the version that you need is listed, otherwise upstream might not have published it yet. If you find it, execute the above command again but replacing checkupdate with update
  • go to matomo100X, disable puppet and set Matomo in maintenance mode (edit /etc/matomo/config.ini.php, there are two options to change: maintenance_mode and record_statistics). It requires a apache2 restart to be effective.
  • take a mysql snapshot of the database (something like mysqldump piwik > piwik-202005141149.sql)
  • apt-get update and apt-get install matomo
  • Follow the instructions, but generally you'll need to run sudo -u www-data /usr/bin/php /usr/share/matomo/console core:update
  • re-enable and run puppet and restart apache2
  • check that everything looks good

Restart Matomo

  • The matomo mysql database is backed up to db1208, so start by downtiming its matomo-related services checks in Icinga. The total downtime should be no more than 15 minutes.
  • Make a patch for Matomo to be in maintenance mode (see for example https://gerrit.wikimedia.org/r/c/operations/puppet/+/670559)
  • Merge the patch, ssh to matomo1002.eqiad.wmnet, and run puppet to apply it on matomo
  • Restart apache for the settings to apply: sudo systemctl restart apache2
  • ssh to db1108 and disable replication by connecting with sudo mysql -S /run/mysqld/mysqld.matomo.sock, then stop slave;
  • On matomo1002, Stop mariadb with sudo systemctl stop mariadb
  • Reboot the node via sudo cookbook sre.hosts.reboot-single matomo1002.eqiad.wmnet
  • Re-enable replication on db1108 with sudo mysql -S /run/mysqld/mysqld.matomo.sock then start slave;
  • Turn off maintenance mode and enable record_statistics with another puppet patch, merge it and run puppet on matomo1002
  • Restart apache for these settings to apply
  • ssh to db1108 and ensure that replication is working with show slave status \G. The output should have:
                 Slave_IO_Running: Yes
                 Slave_SQL_Running: Yes