Pentaho

Pentaho dnedev Wed, 09/24/2014 - 14:30

The Pentaho suite is comprised of several application for Data Integration, Business Analytics and Reporting. It comes in both free Community edition and Enterprise one. Written (mostly) in Java, the Pentaho suite has a great support for MonetDB. There are a few tools in the suite directly useful to MonetDB users:

  • ETL: Pentaho Data Integration (PDI), or formerly know as Kettle, is a powerful tool with numerous versatile components for data Extract, Transform and Load. PDI ships with a bulk-loader for MonetDB and JDBC driver for reading data from MonetDB
  • Business Analytics: Pentaho Interactive Reporting and Dashboard Designer are tools useful for (as their names suggest) reporting and designing interactive web dashboards. 

All of the above tools plus other are available as independently downloadable community-supported applications, as well as integrated in the Pentaho Business Analytics suite.

To read or write data from/to MonetDB in PDI, one needs to create a database connection, which is established with the MonetDB JDBC driver that ships with PDI. To create a new connection, start PDI create/open a job or transformation.

1) Select Tools -> Wizard -> Create database connection.

2) Name the connection (e.g. MonetDB), select MonetDB from the database list and Native (JDBC) from the access type list.

3) Enter hostname (e.g. localhost), port (50000 by default) and your database name.

4) Enter the username and password for your database.

You can click on the Test database connection button to verify it works and then close the dialog.

In a PDI transformation you can use the MonetDB Bulk Loader to quickly load your data processed the tool. Do use the bulk-loading component, add it to your transformation, connect it to your stream and configure it to use the pre-configured connection from the Database connection drop down menu. Next, specify in which table the data should be loaded and which fields from the PDI data stream should be written.

1) Type in the Target schema name (sys by default).

2) Type in or browse to select the Target table.

3) Open the Output Fields tab and select the Incoming Stream fields that should be loaded in MonetDB.

A few notes:

- The first column (Target Table fields) is not used for mapping the data stream fields to table columns, instead only the order in which the Incoming Stream fields are arranged is taken into account. In other words if a field called 'name' is arranged as the first incoming field, it will be mapped to the first column of the target table.

- If the target table does not exist, PDI will attempt to create it, inferring the data types from these of the type of the incoming data field. Due to some mismatch between data types we recommend creating the table beforehand. Clicking on the SQL button on dialog will display the SQL statement used for creating the table, including the data types.