Analytics/Archive/Pentaho

From Wikitech
This page contains historical information. It may be outdated or unreliable.

Pentaho instance on labs

Disclaimers: This setup is on labs and very much alpha, there is no guarantees of support.

Start/Stop Pentaho

ssh <user>@dan-pentaho.eqiad.wmflabs
nuria@dan-pentaho:/home/milimetric/biserver-ce$ ./start-pentaho.sh


Import Cube

To import a cube into Pentaho you need to:

Dump the data you want to import into a TSV file

This example is specific for the analytics research team

Within stat1003 type:

mysql --defaults-extra-file='/etc/mysql/conf.d/research-client.cnf' -h analytics-store.eqiad.wmnet \
-Dstaging -e "select * from my_table;" | cat > my_table.tsv

Note that | cat > my_table.tsv is necessary, because it transforms the mysql common tabular output into TSV format.

Also, Be sure your user belongs to research group or you'll get an error similar to:

unknown variable 'defaults-extra-file'

Now you can compress it and scp it to Pentaho's instance. From your local machine:

scp stat1003.wikimedia.org:my_table.tgz .
scp my_table.tgz dan-pentaho.eqiad.wmflabs:

And you're done.

Create a table in the database Pentaho is looking at

From you local machine, enter Pentaho's instance:

ssh dan-pentaho.eqiad.wmflabs

Open MySQL client:

mysql -uroot -p -Dwarehouse

And create the table normally. Note that the columns must correspond exactly to those in the TSV file.

Import the data into the newly created table

Within Pentaho's instance, decompress the data and import it:

mysqlimport -uroot -p --local --ignore-lines=1 warehouse my_table.tsv

Note that the TSV file must have the same name as the database table.

Create an XML file with the cube description

<Schema name="My Schema">
     <Cube name="My Cube">
         <Table name="my_table"/>
         <Dimension name="Dimension1">
             <Hierarchy hasAll="true">
                 <Level name="Dimension1" column="dimension_1" type="String"/>
             </Hierarchy>
         </Dimension>
         <Dimension name="Dimension2">
             <Hierarchy hasAll="true">
                 <Level name="Dimension2" column="dimension_2" type="Boolean"/>
             </Hierarchy>
         </Dimension>
         ...
         <Measure name="Measure 1" column="measure_1" aggregator="sum"/>
         <Measure name="Measure 2" column="measure_2" aggregator="sum"/>
         ...
     </Cube>
 </Schema>

For more information, visit Mondrian documentation on cube file format: http://mondrian.pentaho.com/documentation/schema.php

Upload the cube file to Pentaho

Go to http://pentaho.wmflabs.org/ and login:

Login as an Evaluator ->  Administrator

Then open the Data Sources Manager:

File -> Manage Data Sources

Add a new analysis data source:

+(Add) -> Analysis

When prompt, browse ... for the cube file and click Import.

Close the Data Sources Manager and create a new Saiku view:

Create New -> New Saiku Analytics

Select the cube from the drop-down in the left. And that's it.