Analytics/Systems/Pivot

From Wikitech
Jump to: navigation, search

Pivot is software from Imply Data, Inc.. It provides a friendly user interface to Druid and is used internally at Wikimedia Foundation. As of 2017, most of the data available in Pivot comes from Hadoop. (See also a snapshot of available data cubes as of April 2017, with update schedules etc.)

When it was initially deployed, Pivot was fully open source. A legal dispute caused it to stop being available in an open source fashion, and we have been running the last available version as published on github.com. Following is a description of the reasons that went into choosing Pivot, alternatives, and our choices going forward.

Choosing a User Interface for Druid

Some of the criteria of why did we choose Druid as our datastore is outlined here: Analytics/Systems/Druid#Why_did_we_Choose_Druid._Value_Proposition but the gist of it is that Druid is a very useful tool in that it allows us to very easily load OLAP-shaped big data and query it efficiently. It's much faster than querying through Hive, for example. The initial down side was that users would need to learn a new JSON query language to access the data. To solve this problem, at the time, we had three options:

  • Pay the folks who develop Saiku to integrate it with Druid (this never got approved in the budget)
  • use Caravel (we tried it out but it was buggy and much more complicated than Pivot, more for analysts than PMs)
  • use Pivot, at the time a new open-source tool from Imply

We chose Pivot, some feedback was gathered here. The early impressions were very positive, and over time we have added more datasets to Druid/Pivot bringing a lot of value to PMs and execs. As we were doing that, Pivot source was being closed for legal reasons. The dispute was resolved but pivot was no longer available under Apache 2.0 license as of November 2016. See: announcement for details.

We deployed the last freely open source available version and that's what we've been running from summer 2016 to the present (May 2017). We had a choice to abandon Pivot because it was being closed, but we felt it brought too much value to do that, and that we could fix bugs on the last open source version if we needed to. An active fork seemed outside the resourcing abilities of our team.

Currently, more and more people are using Pivot and loading data into Druid has reduced our time to deliver data products. But with more use cases the bugs in our current version of Pivot became more obvious. There are two major user interface bugs that are blocking things and several dozen little ones. We found out that new versions of Pivot that Imply has been working on fix these bugs and add useful features such as world maps. We are currently working to see if upgrading Pivot is admissible within our fairly strict open-source-only policy.

Open Sourceness

The makers of Pivot cannot offer us an open source license use to legal reasons but they firmly believers in open source (Druid, their main product, is open source) and offered us a license, that includes usage of the source, for a nominal fee that they later donated to the WMF. We feel the team is strongly committed to open source and they have worked with us to give us the most advantageous license they could give us within their legal requirements.

User Docs and Administration

Access to Pivot

You need a wikitech login that is in the "wmf" or "nda" LDAP groups. If you don't have it, please create a task like https://phabricator.wikimedia.org/T160662

Before requesting access, please make sure you:

Depending on the above, you can request to be added to the wmf group or the nda group. Please indicate the motivation on the task about why you need access and ping the analytics team if you don't hear any feedback soon from the Opsen on duty.

Administration

Logs

On stat1001 everybody can read /var/log/pivot/syslog.log

Deploy

Deployment steps for deployment.eqiad.wmnet:

cd /srv/deployment/analytics/pivot/deploy

git pull

git submodule update --init

scap deploy

The code that renders https://pivot.eqiad.wmnet is running entirely on stat1001.eqiad.wmnet and it is split in two parts:


References