Documentation > User Guide > Getting Started > Tutorial: VOC Database

Tutorial: VOC Database

Following the installation, it is time to get acquainted with the MonetDB suite and its capabilities.

In this tutorial, you will learn how to use the MonetDB suite to analyse datasets catalogued in Dutch-Asiatic Shipping in the 17th and 18th Centuries by J.R. Bruijn, F.S. Gaastra and I. Schöffer. Over 8000 records have been provided, detailing the voyages of the Dutch East India Company (abbreviated to VOC according to the company's official name in Dutch) to Asia and the Cape of Good Hope. The data contains information such as the names of the ships, types of vessels used, names of the captains, arrival and departure dates at harbors, personnel accounts, and whether a ship has survived the journey or met an untimely fate.

The dataset can be loaded into MonetDB with the voc_dump.sql file provided in three different compressed formats listed below. Given that each listed item contains the same file with the same SQL statements, it is only necessary to download one of the compressed files:

After extracting the voc_dump.sql file, its contents can be loaded into MonetDB via the mclient command-line interface. But before this becomes possible, there are some preliminary operating system-dependent steps that need to be completed.

Linux and macOS
Users who have installed MonetDB on Unix-like operating systems (i.e. Linux and macOS) will first have to start the MonetDB daemon monetdbd. It will run in the background and facilitate, among other functionalities, the management of local/remote servers. The daemon is controlled by the application monetdb.
If during installation on a Linux system you have enabled systemd to manage the MonetDB suite, a database farm named dbfarm will have been created automatically. In this case, all your databases will be stored in /var/monetdb5/dbfarm
Certain Linux distributions, as well as macOS, do not allow this option. In this case, you will have to create your own database farm.
Given the option, if you have not enabled systemd to manage the MonetDB suite but wish to do so, then execute the following commands in the shell:
sudo systemctl enable monetdbd
sudo systemctl start monetdbd
OR
If you want/need to create a database farm of your own, you may execute the following commands instead (make sure to choose a suitable name and directory for your database farm):
monetdbd create /path/to/mydbfarm
monetdbd start /path/to/mydbfarm

Windows
Although the MonetDB daemon monetdbd is only available for Unix-like operating systems (such as Linux and macOS), it is still possible to use the MonetDB suite on Windows. The database demo is created by default.
The first step is to initialize a MonetDB server by simply opening Start -> Programs -> MonetDB -> Start server
Now it is possible to start the textual interface mclient by opening Start -> Programs -> MonetDB -> SQL client
The same commands apply as on other platforms. To stop the server, you can simply close the MonetDB SQL Server window. Note that by default the server only accepts connections originating from the local host. If you need other machines to access your database, change the configuration file by setting mapi_open=yes
If you want/need to change the location where your databases are stored, then it is necessary to edit the mclient.bat and M5server.bat files in C:\Program Files\MonetDB\MonetDB5. In the M5server.bat file, change the setting of the MONETDBFARM variable. In mclient.bat, add -d to the mclient.exe call command. The modified files will be overwritten after an update.

Keep in mind the distinction between monetdbd, which is the MonetDB daemon, and monetdb, which is the application that controls the daemon.

Since the SQL interface is the same on all operating systems, all users may proceed with the remainder of this tutorial once a MonetDB database server is running on their system.

Creating a new database

To create a new database, execute the following commands in the shell:

monetdb create voc
monetdb release voc

The create command initializes a new database in the MonetDB database server and the release command brings a database out of maintenance mode for regular use by a client.

To check the state of the database, execute monetdb status voc in the shell. This should produce an output similar to the following:

name  state   health               remarks
voc   S                 mapi:monetdb://fedora:50000/voc

We see that the new database has successfully been created and released from maintenance mode. State "S", which stands for 'Stopped', implies that the database is inactive but ready for use.

Executing monetdb status in the shell will show the state of each known database in the MonetDB database server operating on your system.

Starting mclient

Now it is possible to interact with the newly created database by executing the following command:

mclient -u monetdb -d voc

The above command calls the mclient command-line tool with the -u option, which specifies the user to connect as (in this case the default 'monetdb'), and -d option, which specifies the database to connect to. You will be prompted to enter a password, which for the default user 'monetdb' is monetdb.

Entering the password will activate the SQL interface, which should have a similar appearance as the output displayed below:

Welcome to mclient, the MonetDB/SQL interactive terminal (Mar2025-SP2)
Database: MonetDB v11.53.13 (Mar2025-SP2), 'mapi:monetdb://fedora:50000/voc'
FOLLOW US on https://github.com/MonetDB/MonetDB
Type \q to quit, \? for a list of available commands
auto commit mode: on
sql>

SQL statements can be executed after the sql> prompt. As a test, try:

SELECT 'hello world';

Which should produce:

+-------------+
| %2          |
+=============+
| hello world |
+-------------+
1 tuple

Following this quick test, we can finally explore the wealth of functionalities offered by the MonetDB suite.

Creating a new user and schema

Before loading the VOC datasets into MonetDB, you are advised to introduce an alternative user login. Let's create a new user 'VOC Explorer' with username 'voc' and password voc via the following SQL statement:

CREATE USER "voc" WITH PASSWORD 'voc' NAME 'VOC Explorer' SCHEMA "sys";

To verify that the new user has been created, let's pull the full list of users via SELECT * FROM sys.users;

+-----------+---------------+----------------+-------------+------------+-------------+--------------+--------------+
| name      | fullname      | default_schema | schema_path | max_memory | max_workers | optimizer    | default_role |
+===========+===============+================+=============+============+=============+==============+==============+
| monetdb   | MonetDB Admin |           2000 | "sys"       |          0 |           0 | default_pipe |            3 |
| .snapshot | Snapshot User |           2000 | "sys"       |          0 |           0 | default_pipe |         6265 |
| voc       | VOC Explorer  |           2000 | "sys"       |          0 |           0 | default_pipe |         7163 |
+-----------+---------------+----------------+-------------+------------+-------------+--------------+--------------+
3 tuples

We see that the user 'VOC Explorer' with username 'voc' has indeed been created. The column default_role displays the unique authorization identifier for each user.

To view the full list of authorization objects in the database, execute SELECT * FROM sys.auths;

+------+-----------+---------+
| id   | name      | grantor |
+======+===========+=========+
|    1 | public    |       0 |
|    2 | sysadmin  |       0 |
|    3 | monetdb   |       0 |
| 6265 | .snapshot |       3 |
| 7163 | voc       |       3 |
+------+-----------+---------+
5 tuples

We see that the user 'voc' has been assigned the number 7163 as its authorization ID.

Now let's create a new schema named 'voc' and give authorization to 'VOC Explorer':

CREATE SCHEMA "voc" AUTHORIZATION "voc";

To see the full list of schemas, execute SELECT * FROM sys.schemas;

+------+--------------------+---------------+-------+--------+
| id   | name               | authorization | owner | system |
+======+====================+===============+=======+========+
| 2000 | sys                |             2 |     3 | true   |
| 2114 | tmp                |             2 |     3 | true   |
| 5787 | json               |             3 |     3 | true   |
| 5883 | profiler           |             3 |     3 | true   |
| 6695 | logging            |             3 |     3 | true   |
| 6829 | information_schema |             3 |     3 | true   |
| 7164 | voc                |          7163 |     3 | false  |
+------+--------------------+---------------+-------+--------+
7 tuples

We see that the schema 'voc' has been created with the unique internal ID 7164 and that authorization has been granted to the user 'VOC Explorer' given the number 7163 in the authorization column.

Now we are obliged to alter permissions by means of the following statement:

ALTER USER "voc" SET SCHEMA "voc";

If we pull the list of users again via SELECT * FROM sys.users;, we see that the default schema for user 'VOC Explorer' has been changed to 'voc' given the number 7164 in the default_schema column:

+-----------+---------------+----------------+-------------+------------+-------------+--------------+--------------+
| name      | fullname      | default_schema | schema_path | max_memory | max_workers | optimizer    | default_role |
+===========+===============+================+=============+============+=============+==============+==============+
| monetdb   | MonetDB Admin |           2000 | "sys"       |          0 |           0 | default_pipe |            3 |
| .snapshot | Snapshot User |           2000 | "sys"       |          0 |           0 | default_pipe |         6265 |
| voc       | VOC Explorer  |           7164 | "sys"       |          0 |           0 | default_pipe |         7163 |
+-----------+---------------+----------------+-------------+------------+-------------+--------------+--------------+
3 tuples

So far, all interactions with the database have been conducted as the default user.

To change from the default user to 'voc', execute the following:

SET CURRENT_USER = voc;

Check with SELECT CURRENT_USER;

+------+
| %2   |
+======+
| voc  |
+------+
1 tuple

To change from the default 'sys' schema to the 'voc' schema:

SET SCHEMA voc;

Check with SELECT CURRENT_SCHEMA;

+------+
| %2   |
+======+
| voc  |
+------+
1 tuple

Transaction management

Before we load the VOC datasets into our database, let's first create a throw-away database object and learn about transaction management in the process.

Since this is a test, we want to turn off autocommit via the START TRANSACTION; statement (from here on, all changes will be dropped after aborting the transaction via the ROLLBACK; statement at the end).

After turning off autocommit, create a new table called 'test' via the following statement:

CREATE TABLE test (id INT, data TEXT);

Executing \d will list all known database objects:

TABLE  voc.test

From the above, we see that the newly created table 'test' is contained within the schema 'voc'.

Executing \d test will describe the database object 'test':

CREATE TABLE "voc"."test" (
        "id"   INTEGER,
        "data" CHARACTER LARGE OBJECT
);

Here we see that the database object 'test' is a table comprising two columns titled id and data; the former has the integer data type and the latter has a character string data type.

Now that we are done with the test, we can execute ROLLBACK; in order to drop all changes made since the start of the transaction. If you execute \d now, you will not see the table 'test' anymore.

Loading the VOC dataset

Now it's time to import the VOC datasets into the database via the voc_dump.sql file that was downloaded at the start of this tutorial.

To do so, copy and paste the line below into the mclient terminal, then replace /full/path/to with the actual path to the location of the voc_dump.sql file on your system:

\< /full/path/to/voc_dump.sql

The above command executes the SQL statements contained in the voc_dump.sql script file, which first creates the tables and then loads the datasets into those tables.

Pulling the list of database objects via \d should produce the following:

TABLE  voc.craftsmen
TABLE  voc.impotenten
TABLE  voc.invoices
TABLE  voc.passengers
TABLE  voc.seafarers
TABLE  voc.soldiers
TABLE  voc.total
TABLE  voc.voyages

To check if the data has indeed been loaded, let's count the number of rows in the 'voyages' table by executing SELECT COUNT(*) FROM voyages;

The output should be:

+------+
| %1   |
+======+
| 8131 |
+------+
1 tuple

If the 'voyages' table had been created without the relevant dataset being loaded into it, then the above value would be zero. Since it is not zero, we know that the data has successfully been loaded into the database.

Primary keys and foreign keys

The VOC dataset comprises eight tables bound to each other via foreign key relationships. A foreign key in one table refers to the primary key of another table, which is a set of attributes (commonly one or more columns) comprising unique identifiers for each record in a table.

In this case, the 'voyages' table is the main table that all other tables refer to; it only comprises a primary key. The primary key of this table is composed of an index number and supplement (in the form of a string character) for cases where an index number occurs more than once in the same table. This ensures that each individual record in the table can be identified by a unique set of attributes.

To see which records have the same index number in the 'voyages' table, execute the following statement:

SELECT number, number_sup FROM voyages WHERE number IN ( 
        SELECT number FROM voyages GROUP BY number HAVING COUNT(*) > 1 
    );

The outcome should be:

+--------+------------+
| number | number_sup |
+========+============+
|   5022 |            |
|   5022 | A          |
|   5980 |            |
|   5980 | A          |
|   5987 |            |
|   5987 | A          |
|   6246 |            |
|   6246 | A          |
|   6649 |            |
|   6649 | A          |
+--------+------------+
10 tuples

As is evident from the records displayed above, the index number supplement in the number_sup column allows these records to be distinguished from each other.

Every table, except 'invoices', has the combination of the columns number, containing the voyage index numbers, and number_sup, containing the index number supplements, as the primary key.

As an example, let's pull a description of the 'craftsmen' table by executing \d craftsmen:

CREATE TABLE "voc"."craftsmen" (
        "number"               INTEGER       NOT NULL,
        "number_sup"           CHAR(1)       NOT NULL,
        "trip"                 INTEGER,
        "trip_sup"             CHAR(1),
        "onboard_at_departure" INTEGER,
        "death_at_cape"        INTEGER,
        "left_at_cape"         INTEGER,
        "onboard_at_cape"      INTEGER,
        "death_during_voyage"  INTEGER,
        "onboard_at_arrival"   INTEGER,
        CONSTRAINT "craftsmen_number_number_sup_pkey" PRIMARY KEY ("number", "number_sup"),
        CONSTRAINT "craftsmen_number_number_sup_fkey" FOREIGN KEY ("number", "number_sup") REFERENCES "voc"."voyages" ("number", "number_sup")
);

From the first constraint, we see that the 'craftsmen' table indeed has the combination of the number and number_sup columns as the primary key. The second constraint shows that the foreign key references the primary key in the table 'voyages'.

Now let's pull a description of the 'invoices' table via \d invoices:

CREATE TABLE "voc"."invoices" (
        "number"     INTEGER,
        "number_sup" CHAR(1),
        "trip"       INTEGER,
        "trip_sup"   CHAR(1),
        "invoice"    INTEGER,
        "chamber"    CHAR(1),
        CONSTRAINT "invoices_number_number_sup_fkey" FOREIGN KEY ("number", "number_sup") REFERENCES "voc"."voyages" ("number", "number_sup")
);

Here we see that a primary key has indeed been omitted from the 'invoices' table. The reason for this stems from the fact that multiple invoices can be assigned to a single voyage, which results in multiple records with the same combination of voyage index number and supplement. As a result, the combination of the number and number_sup columns in the 'invoices' table cannot be assigned as a primary key.

Creating a view

A "view" in SQL terminology is essentially a stored query that has been assigned a unique name. The same statements used to query a table apply to a view. It is a useful feature that prevents repetition on the user's end.

Let's define a view called 'ship_occupants' that combines into one big table all records of the different groups of people logged in the 'craftsmen', 'impotenten', 'passengers', 'seafarers', 'soldiers' and 'total' tables:

CREATE VIEW ship_occupants AS
        SELECT * FROM (
                SELECT 'craftsmen' AS type, craftsmen.* FROM craftsmen
                UNION ALL
                SELECT 'impotenten' AS type, impotenten.* FROM impotenten
                UNION ALL
                SELECT 'passengers' AS type, passengers.* FROM passengers
                UNION ALL
                SELECT 'seafarers' AS type, seafarers.* FROM seafarers
                UNION ALL
                SELECT 'soldiers' AS type, soldiers.* FROM soldiers
                UNION ALL
                SELECT 'total' AS type, total.* FROM total
        ) AS ship_occupants_table;

The newly created view will show in the list of database objects when executing \d:

TABLE  voc.craftsmen
TABLE  voc.impotenten
TABLE  voc.invoices
TABLE  voc.passengers
TABLE  voc.seafarers
VIEW   voc.ship_occupants
TABLE  voc.soldiers
TABLE  voc.total
TABLE  voc.voyages

The same statements used to query a table apply to a view. To showcase this, let's calculate the number of people recorded in each group of occupants via the following statement:

SELECT type, COUNT(*) AS total FROM ship_occupants GROUP BY type ORDER BY type;

This should lead to the following output:

+------------+-------+
| type       | total |
+============+=======+
| craftsmen  |  2349 |
| impotenten |   938 |
| passengers |  2813 |
| seafarers  |  4468 |
| soldiers   |  4177 |
| total      |  2467 |
+------------+-------+
6 tuples

To verify the accuracy of the view, let's calculate the number of passengers via the original 'passengers' table:

SELECT count(*) FROM passengers;

You should receive the following output:

+------+
| %1   |
+======+
| 2813 |
+------+
1 tuple

Session termination and logins

To terminate the connection to the server, execute \q or press Ctrl + d. This will return your session to the regular shell terminal.

Next time you wish to interact with the 'voc' database, you can execute the command mclient -u voc -d voc in the shell and enter the password voc when prompted. This will return you to the same session as used in this tutorial.

Alternatively, login credentials can be saved in the .monetdb file in your home directory. Using your preferred text editor, copy and paste the following text into ~/.monetdb:

user=voc
password=voc

From now on, executing mclient -d voc will automatically start the interactive terminal for the database 'voc' as user 'voc' without a password prompt.

Endnote

While you should now have a good overview of the MonetDB suite and its capabilities, we encourage you to experiment and play around with the VOC (or other) datasets to further your understanding of MonetDB/SQL. We hope this tutorial has been informative and that you will be able solve your database challenges with MonetDB!