Installation and Introduction

Installation and Introduction mk Tue, 02/18/2020 - 11:17

MonetDB is a column-oriented Database Management System. Its code is open source, so you are free to download an install it on your system. For in-depth details on how to build from source and configure according to your needs, refer to the Server Administration section. For the sake of this tutorial, we are going to use a package that is available for download in the Download section.

The MonetDB code is available in easy to install packages for system administrators. The platforms supported with their detailed requirements are shown on the download menu. A private install can be obtained by downloading and compiling the source distribution. You don't need super-user or system administrator privileges for this.

After installation of MonetDB software you will find a number of programs in the bin sub-directory of your installation directory. Below is an overview of the server and client programs distributed with MonetDB.

Server program Description Documentation
monetdbd The MonetDB Database Server daemon.Once started it runs in the background and facilitates, amongst others, management of local (and remote) mserver5's. The daemon is controlled by the client application monetdb. Note: monetdbd is currently only available on UNIX-like platforms. [See man page]
mserver5 The MonetDB server, version 5. It runs as a console program. Client programs (except monetdb) connect to this server process. [See man page]
Client program Description Documentation
monetdb A command-line program to control the monetdbd deamon server by executing a single command. It allows you for instance to start & stop an mserver5 process or create a new database. [See man page]
mclient A command-line program to interactively communicate to a running mserver5 process. The program is intended for technical persons, e.g. DBA's, SQL developers and system adminstrators. [See man page]
jdbcclient.jar A Java command-line program to interactively communicate to a running mserver5 process via the MonetDB JDBC driver. The program is intended for technical persons, e.g. SQL developers and DBA's. Note: This program is a separate download. [See man page]
msqldump A command-line program to export the structure and content of a database or a specific table to a file. This allows you to quickly export and re-create the database or table on a different server. [See man page]


Windows Tutorial

Windows Tutorial mk Mon, 02/24/2020 - 17:27

Although monetdbd is only available in Unix systems, it is still possible to use MonetDB on Windows.

On Windows systems, the first step is to initialize a MonetDB server by simply opening: 'Start -> Programs -> MonetDB -> Start server'. 

Subsequently you can start the textual interface (mclient) by opening: 'Start -> Programs -> MonetDB -> SQL client'.  The commands entered there are identical to those found on other platforms. To stop the server, you can simply close the MonetDB SQL Server window. 

Note the server by default only accepts connections originating from the local host. If you need other machines to access your database, change the configuration file by setting mapi_open=yes.

Exploring the wealth of functionality offered by MonetDB/SQL is best started using a toy database. For this we use the VOC database which provides a peephole view into the administrative system of an early multi-national company, the VOC (or Dutch East India Company).

Download the VOC dataset (542K) gz (519K) bz2 (371K) which is a compressed file with SQL statements (SQL database dump). After the file has been extracted, load its contents into MonetDB using the mclient.exe file. From within mclient run:

sql> \< voc_dump.sql

Since the SQL interface is the same on all platforms, from here on you can follow the later part of the regular SQL tutorial.

Changing database location

If you need (or want) to change the location of the database then editing of the monetdb script is required. Change the mclient.bat and M5server.bat files in 'C:\Program Files\MonetDB\MonetDB5'. In the M5server.bat file you would need to change the setting of the MONETDBFARM variable, and in mclient.bat you would need to add a -d option to the call of mclient.exe.  Note: you upgrade such changed files will be overwritten.

About the VOC dataset

The data for the tutorial is published in the book J.R. Bruijn, F.S. Gaastra and I. Schaar Dutch-Asiatic Shipping in the 17th and 18th Centuries, which gives an account of the trips made to the East and ships returned safely (or wrecked on the way) by the VOC (or Dutch East India Company). A total of 8000 records are provided. They include information about ship name and type, captain, the arrival/departure of harbors along the route, personnel accounts, and anecdotal information. You can find more about the VOC on Wikipedia.

UI and Compatibility

UI and Compatibility mk Tue, 03/03/2020 - 10:54

Generally speaking, all the MonetDB programs are started from a command shell (such as bash or cmd.exe) or invoked from a script file. They do not have a Graphical User Interface (GUI)

However, as a user you may prefer to use more friendly GUI-programs. Many good GUI-programs are available. Those generic GUI-programs typically allow you to connect to any DBMS via standard database connectivity API such as ODBC or JDBC. MonetDB provides both an ODBC driver and a JDBC driver. The JDBC driver is usable for programs written in Java. There are also MonetDB server connectivity API libraries available for popular programming languages such as Python, PHP, Perl. [More].

If you need the MonetDB ODBC driver or JDBC driver, you will need to download, install and configure them at the computer OS where the GUI-program is installed or started. For the ODBC driver you will need system administrator privileges to install the MonetDB ODBC driver components. For the JDBC driver no sys-admin privileges are needed, you only have to download the single jar (Java ARchive) library file (i.e. monetdb-jdbc-2.29.jre7.jar) from the download area and configure in your GUI-program where the jar file is stored (or add it to the CLASSPATH environment setting or java -cp startup parameter). The MonetDB JDBC driver is a 100% pure Java (type 4) driver, which means the same JDBC driver can be deployed on all platforms which support the Java Runtime Environment (JRE) version 1.7 or higher.

MonetDB.R Tutorial

MonetDB.R Tutorial dnedev Tue, 02/25/2020 - 11:38

The MonetDB.R tutorial is designed to familiarise R users with the functionality of the MonetDB.R connector and dplyr library for data manipulation in R. 

Before stating with this tutorial, make sure you have a running instance on MonetDB to which you can connect. As such, the MonetDB/SQL tutorial is a recommended reading, even more, given that this page was adapted from the SQL tutorial.

To start, in case you have not done it yet, create a DB farm using the MonetDB daemon monetdbd and a new database called 'voc'. The daemon is controlled by the application monetdb. See its documentation for all details. Note: The monetdbd is not available on Windows. For that reason, on these systems a default database called demo is available by default (Windows-specific MonetDB tutorial is available here). 

From that moment on, you can start your favourite R shell, setup a MonetDB.R connection and process your data in R.
Now, install MonetDB.R from your R shell.
> install.packages("MonetDB.R")

At this time you will be prompted to select a mirror.

After MonetDB.R is installed you must load it.

> library(MonetDB.R)
Loading required package: DBI
Loading required package: digest

Now, let's create a connection to the database.

Note that by default MonetDB runs on port 50000, but MonetDB.R already knows that, thus there is no need to set the port. Also, in the tutorial we've use the demo database, which is the default on Windows.

> conn <- dbConnect(MonetDB.R(), host="localhost", dbname="demo", user="monetdb", password="monetdb")

Let's execute a dummy 'hello world' SQL query and in our database directly from R.

> dbGetQuery(conn,"SELECT 'hello world'")
1  hello world

To illustrate the use of the newly created account and schema, the following example connects and creates a table, which is dropped afterwards by aborting the transaction.

> dbTransaction(conn)
> dbSendUpdate(conn,"CREATE TABLE test (id int, data varchar(30))")
> dbListTables(conn)
[1] "test"
> dbExistsTable(conn,"test")
[1] TRUE
> dbRollback(conn)
[1] TRUE
> dbListTables(conn)

For the purpose of the tutorial we have prepared a TSV (tab-separated values) version of the VOC dataset, which contains data for around 8000 voyages. After unpacking the .zip you can import the data into the directly from R.
Using the function we can load the specified file as a new table 'craftsmen', specifying the upper bound on rows to import (2500), the delimiter (\t). The last parameter indicates that at least 1474 rows need to be read, before before the function can safely auto-deretmine the datatype of the columns.

> file <- "voc_craftsmen.tsv"
>, file, "craftsmen", 2350, delim="\t", nrow.check=1474)
voc_craftsmen.tsv /Users/monetdb/voc_craftsmen.tsv 
1 2348

Now, let's repeat the same for the other files in the dataset.

> file <- "voc_impotenten.tsv", file, "impotenten", 1000, delim="\t", nrow.check=160)
file <- "voc_invoices.tsv", file, "invoices", 4000, delim="\t", nrow.check=1000)
file <- "voc_passengers.tsv", file, "passengers", 3000, delim="\t", nrow.check=1000)
file <- "voc_seafarers.tsv", file, "seafarers", 4500, delim="\t", nrow.check=4500)
file <- "voc_soldiers.tsv", file, "soldiers", 4500, delim="\t", nrow.check=4000)
file <- "voc_total.tsv", file, "total", 2500, delim="\t", nrow.check=2000)
file <- "voc_voyages.tsv", file, "voyages", 8200, delim="\t", nrow.check=5000)


Next, let's install and load dplyr, which is designed to facilitate efficient datasets manipulation in R.

> install.packages("dplyr")
> library(dplyr)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
    filter, lag
The following objects are masked from ‘package:base’:
    intersect, setdiff, setequal, union

Make a new connection to the database for dplyr, specifying that the source is a MonetDB instance.

> monetdb_conn <- src_monetdb("demo")

Now, lets create non-matirialized views for all tables from the database we are going to need later in R. Note that none of these will actaully move data from MonetDB into R, rather we are just creating a reference to the tables.

> craftsmen <- tbl(monetdb_conn, "craftsmen")
impotenten <- tbl(monetdb_conn, "impotenten")
invoices <- tbl(monetdb_conn, "invoices")
passengers <- tbl(monetdb_conn, "passengers")
seafarers <- tbl(monetdb_conn, "seafarers")
soldiers <- tbl(monetdb_conn, "soldiers")
total <- tbl(monetdb_conn, "total")
voyages <- tbl(monetdb_conn, "voyages")

Let's check out the data we just loaded. First, let's see the dimensions and the column names of the voyages table.

> dim(voyages)
[1] 8131   22
> colnames(voyages)
 [1] "number"            "number_sup"        "trip"             
 [4] "trip_sup"          "boatname"          "master"           
 [7] "tonnage"           "type_of_boat"      "built"            
[10] "bought"            "hired"             "yard"             
[13] "chamber"           "departure_date"    "departure_harbour"
[16] "cape_arrival"      "cape_departure"    "cape_call"        
[19] "arrival_date"      "arrival_harbour"   "next_voyage"      
[22] "particulars"      

As you can see, the respose is actaully an R array.

Since we have more than 8000 rows, let's have a look at the first few.

> head(voyages)

If you are interested in what queries are executed in MonetDB you can turn on the query debug

> options(monetdb.debug.query=T)
> head(voyages)
QQ: 'SELECT "number", "number_sup", "trip", "trip_sup", "boatname", "master", "tonnage", "type_of_boat", "built", "bought", "hired", "yard", "chamber", "departure_date", "departure_harbour", "cape_arrival", "cape_departure", "cape_call", "arrival_date", "arrival_harbour", "next_voyage", "particulars"
FROM "voyages"
QQ: Query result for query 29 with 6 rows and 22 cols, 6 rows.

As you can see, the connector uses the native LIMIT SQL function for the head query.

Disable the debug view for now.

> options(monetdb.debug.query=F)

Let's see how many impotenten we have in the dataset:

> summarise(impotenten, count = n())

1  938
.. ...

It is possible to play with the set in many ways, to find out several things that took place during the voyages of the ships, or the money that was earned. A few examples are shown below:

> invoices_grouped_per_chamber <- group_by(invoices, chamber)
> average_invoices_per_chamber <- summarise(invoices_grouped_per_chamber, average = mean(invoice))
> arrange(average_invoices_per_chamber, desc(average))
Arrange: desc(average) 

   chamber  average
1        A 282996.6
2        Z 259300.5
3        H 150182.4
4        R 149628.3
5        D 149522.8
6        E 149518.0
7       NA  83309.0
..     ...      ...

See also

If you want to lear more about MonetDB.R connector or the dplyr, follow the links below:

The MonetDB also supports embedded R queries in the database. This feature was introduced in the Oct2014 release. Check out this blog post for more information.

About the VOC dataset

VOCThe data for the tutorial is published in the book J.R. Bruijn, F.S. Gaastra and I. Schaar Dutch-Asiatic Shipping in the 17th and 18th Centuries, which gives an account of the trips made to the East and ships returned safely (or wrecked on the way) by the VOC (or Dutch East India Company). A total of 8000 records are provided. They include information about ship name and type, captain, the arrival/departure of harbors along the route, personnel accounts, and anecdotal information. You can find more about the VOC on Wikipedia.