Analytics/Systems/Turnilo

From Wikitech
Jump to navigation Jump to search

Turnilo provides a friendly user interface to Druid and is used internally at Wikimedia Foundation. As of 2017, most of the data available in Turnilo comes from Hadoop. (See also a snapshot of available data cubes as of April 2017, with update schedules etc.).

Access

Before requesting access, please make sure you:

Depending on the above, you can request to be added to the wmf group or the NDA group. Please indicate the motivation on the task about why you need access and ping the analytics team if you don't hear any feedback soon from the Opsen on duty.

Once you have a wikitech login please create a task like the following so SRE can give you permits to access: https://phabricator.wikimedia.org/T160662

Administration

Restart

sudo systemctl restart turnilo

Logs

Everybody can read /var/log/pivot/syslog.log

The Analytics team can also use journalctl:

sudo journalctl -u turnilo -f

The -f is needed to keep tailing the logs, otherwise feel free to remove it.

Deploy

Deployment steps for deployment.eqiad.wmnet:

cd /srv/deployment/analytics/pivot/deploy

git pull

git submodule update --init

scap deploy

The code that renders https://turnilo.wikimedia.org is running entirely on analytics-tool1002.eqiad.wmnet and it is split in two parts:

Test config changes

  • Make sure you can ssh to turnilo's box.
  • ps -auxfww on box will tell you the command you need to run, something like:
 /usr/bin/nodejs /srv/deployment/analytics/turnilo/deploy/node_modules/.bin/turnilo --config config.yaml
  • copy yaml file with config locally to your home directory and change port in which turnilo runs (say you changed it to 9091)
  • start a process on box using your local config
  • connect via localhost: ssh -N analytics-some.eqiad.wmnet -L 9091:localhost:9091

History

Druid is a very useful tool that allows us to very easily load OLAP-shaped big data and query it efficiently. It's much faster than querying through Hive, for example. The initial down side was that users would have needed to learn a new JSON query language to access the data. To solve this problem, at the time, we had three options:

  • Pay the folks who develop Saiku to integrate it with Druid (this never got approved in the budget)
  • use Caravel (we tried it out but it was buggy and much more complicated than Pivot, more for analysts than PMs)
  • use Pivot, at the time a new open-source tool from Imply

We chose Pivot, some feedback was gathered here. The early impressions were very positive, and over time we added more datasets to Druid and Pivot bringing a lot of value to product managers and execs. As we were doing that, Pivot's source was being closed for legal reasons. The dispute was resolved but Pivot was no longer available under Apache 2.0 license after November 2016. See: announcement for details.

In May 2018, we deployed a new fork of Pivot: Turnilo. While it does not add any new features, it seems well maintained and it is certainly faster.