Pentaho

From Wikitech
This page contains historical information. It may be outdated or unreliable. 2011

This page details some initial investigation into what it would take to install the Pentaho business intelligence platform for collecting stats about the Wikimedia community.

Context

A few open source projects have recently started creating dashboards that show their community's activities. The most common is commits to source control, but also interesting is traffic in other areas such as the mailing list, bug reports, IRC, and so on. There are two flagship implementations of these sorts of statistics: meego (http://wiki.meego.com/Metrics/Dashboard) and Mozilla. Mozilla's dashboard is sadly inaccessible to the general public.

As a first step towards developing one of these things, I (Bhartshorne 23:37, 11 October 2011 (UTC)) spent some time digging into Pentaho and taking a first stab at setting it up.[reply]

Pentaho structure

Pentaho is actually an umbrella project that has a number of components. From our perspective, the interesting ones are the "Pentaho data integration" aka PDI and the "business intelligence server" aka biserver. So far as I understand it, the pdi manages the data from collection through transformation and storage. The biserver does reporting and is the front end with which users interact.

I tried to install the biserver. Most of it was a pretty straight forward path following the meego installation instructions with two exceptions. The download link is wrong - it points to a different part of the Pentaho software stack. What you actually want is the biserver. Once I got that sorted, things went well until I tried to load the admin console; it claimed the configuration was not valid but gave no indications on what to change.

Links