Analytics/Archive/Pentaho
Pentaho instance on labs
Disclaimers: This setup is on labs and very much alpha, there is no guarantees of support.
Start/Stop Pentaho
ssh <user>@dan-pentaho.eqiad.wmflabs
nuria@dan-pentaho:/home/milimetric/biserver-ce$ ./start-pentaho.sh
Import Cube
To import a cube into Pentaho you need to:
Dump the data you want to import into a TSV file
This example is specific for the analytics research team
Within stat1003 type:
mysql --defaults-extra-file='/etc/mysql/conf.d/research-client.cnf' -h analytics-store.eqiad.wmnet \ -Dstaging -e "select * from my_table;" | cat > my_table.tsv
Note that | cat > my_table.tsv
is necessary, because it transforms the mysql common tabular output into TSV format.
Also, Be sure your user belongs to research group or you'll get an error similar to:
unknown variable 'defaults-extra-file'
Now you can compress it and scp it to Pentaho's instance. From your local machine:
scp stat1003.wikimedia.org:my_table.tgz . scp my_table.tgz dan-pentaho.eqiad.wmflabs:
And you're done.
Create a table in the database Pentaho is looking at
From you local machine, enter Pentaho's instance:
ssh dan-pentaho.eqiad.wmflabs
Open MySQL client:
mysql -uroot -p -Dwarehouse
And create the table normally. Note that the columns must correspond exactly to those in the TSV file.
Import the data into the newly created table
Within Pentaho's instance, decompress the data and import it:
mysqlimport -uroot -p --local --ignore-lines=1 warehouse my_table.tsv
Note that the TSV file must have the same name as the database table.
Create an XML file with the cube description
<Schema name="My Schema"> <Cube name="My Cube"> <Table name="my_table"/> <Dimension name="Dimension1"> <Hierarchy hasAll="true"> <Level name="Dimension1" column="dimension_1" type="String"/> </Hierarchy> </Dimension> <Dimension name="Dimension2"> <Hierarchy hasAll="true"> <Level name="Dimension2" column="dimension_2" type="Boolean"/> </Hierarchy> </Dimension> ... <Measure name="Measure 1" column="measure_1" aggregator="sum"/> <Measure name="Measure 2" column="measure_2" aggregator="sum"/> ... </Cube> </Schema>
For more information, visit Mondrian documentation on cube file format: http://mondrian.pentaho.com/documentation/schema.php
Upload the cube file to Pentaho
Go to http://pentaho.wmflabs.org/ and login:
Login as an Evaluator -> Administrator
Then open the Data Sources Manager:
File -> Manage Data Sources
Add a new analysis data source:
+(Add) -> Analysis
When prompt, browse ...
for the cube file and click Import
.
Close the Data Sources Manager and create a new Saiku view:
Create New -> New Saiku Analytics
Select the cube from the drop-down in the left. And that's it.