Tutorial

Tutorial giulia Tue, 02/18/2020 - 11:05

This section is an introductory guide to MonetDB. It contains instructions for a simple installation, a tutorial for basic usage, an introduction to SQL language and an overview of Advanced Features in MonetDB.

For in-depth details, refer to the appropriate sections in the documentation

Installation and Introduction

Installation and Introduction mk Tue, 02/18/2020 - 11:17

MonetDB is a column-oriented Database Management System. Its code is open source, so you are free to download an install it on your system. For in-depth details on how to build from source and configure according to your needs, refer to the Server Administration section. For the sake of this tutorial, we are going to use a package that is available for download in the Download section

The MonetDB code is available in easy to install packages for system administrators. The platforms supported with their detailed requirements are shown on the download menu. A private install can be obtained by downloading and compiling the source distribution.  You don't need super-user or system administrator privileges for this.

After installation of MonetDB software you will find a number of programs in the bin sub-directory of your installation directory. Below is an overview of the server and client programs distributed with MonetDB.

Server program Description Documentation
monetdbd The MonetDB Database Server daemon.Once started it runs in the background and facilitates, amongst others, management of local (and remote) mserver5's. The daemon is controlled by the client application monetdb. Note: monetdbd is currently only available on UNIX-like platforms. [See manpage]
mserver5 The MonetDB server, version 5. It runs as a console program. Client programs (except monetdb) connect to this server process. [See manpage]
Client program Description Documentation
monetdb A command-line program to control the monetdbd deamon server by executing a single command. It allows you for instance to start & stop an mserver5 process or create a new database. [See manpage]
mclient A command-line program to interactively communicate to a running mserver5 process. The program is intended for technical persons, e.g. DBA's, SQL developers and system adminstrators. [See manpage]
jdbcclient.jar A Java command-line program to interactively communicate to a running mserver5 process via the MonetDB JDBC driver. The program is intended for technical persons, e.g. SQL developers and DBA's. Note: This program is a separately downloaded. [See manpage]
msqldump A command-line program to export the structure and content of a database or a specific table to a file. This allows you to quickly export and re-create the database or table on a different server. [See manpage]

 

Windows Tutorial

Windows Tutorial mk Mon, 02/24/2020 - 17:27

Although monetdbd is only available in Unix systems, it is still possible to use MonetDB on Windows.

On Windows systems, the first step is to initialize a MonetDB server by simply opening: 'Start -> Programs -> MonetDB -> Start server'. 

Subsequently you can start the textual interface (mclient) by opening: 'Start -> Programs -> MonetDB -> SQL client'.  The commands entered there are identical to those found on other platforms. To stop the server, you can simply close the MonetDB SQL Server window. 

Note the server by default only accepts connections originating from the local host. If you need other machines to access your database, change the configuration file by setting mapi_open=yes.

Exploring the wealth of functionality offered by MonetDB/SQL is best started using a toy database. For this we use the VOC database which provides a peephole view into the administrative system of an early multi-national company, the VOC (or Dutch East India Company).

Download the VOC dataset voc_dump.zip (542K) gz (519K) bz2 (371K) which is a compressed file with SQL statements (SQL database dump). After the file has been extracted, load its contents into MonetDB using the mclient.exe file. From within mclient run:

sql> \< voc_dump.sql

Since the SQL interface is the same on all platforms, from here on you can follow the later part of the regular SQL tutorial.

Changing database location

If you need (or want) to change the location of the database then editing of the monetdb script is required. Change the mclient.bat and M5server.bat files in 'C:\Program Files\MonetDB\MonetDB5'. In the M5server.bat file you would need to change the setting of the MONETDBFARM variable, and in mclient.bat you would need to add a -d option to the call of mclient.exe.  Note: you upgrade such changed files will be overwritten.

About the VOC dataset

The data for the tutorial is published in the book J.R. Bruijn, F.S. Gaastra and I. Schaar Dutch-Asiatic Shipping in the 17th and 18th Centuries, which gives an account of the trips made to the East and ships returned safely (or wrecked on the way) by the VOC (or Dutch East India Company). A total of 8000 records are provided. They include information about ship name and type, captain, the arrival/departure of harbors along the route, personnel accounts, and anecdotal information. You can find more about the VOC on Wikipedia.

UI and Compatibility

UI and Compatibility mk Tue, 03/03/2020 - 10:54

Generally speaking, all the MonetDB programs are started from a command shell (such as bash or cmd.exe) or invoked from a script file. They do not have a Graphical User Interface (GUI)

However, as a user you may prefer to use more friendly GUI-programs. Many good GUI-programs are available. Those generic GUI-programs typically allow you to connect to any DBMS via standard database connectivity API such as ODBC or JDBC. MonetDB provides both an ODBC driver and a JDBC driver. The JDBC driver is usable for programs written in Java. There are also MonetDB server connectivity API libraries available for popular programming languages such as Python, PHP, Perl. [More].

If you need the MonetDB ODBC driver or JDBC driver, you will need to download, install and configure them at the computer OS where the GUI-program is installed or started. For the ODBC driver you will need system administrator privileges to install the MonetDB ODBC driver components. For the JDBC driver no sys-admin privileges are needed, you only have to download the single jar (Java ARchive) library file (such as monetdb-jdbc-2.29.jre7.jar) from the download area and configure in your GUI-program where the jar file is stored (or add it to the CLASSPATH environment setting or java -cp startup parameter). The MonetDB JDBC driver is a 100% pure Java (type 4) driver, which means the same JDBC driver can be deployed on all platforms which support the Java Runtime Environment (JRE) version 1.7 or higher.

MonetDB.R Tutorial

MonetDB.R Tutorial dnedev Tue, 02/25/2020 - 11:38

The MonetDB.R tutorial is designed to familiarise R users with the functionality of the MonetDB.R connector and dplyr library for data manipulation in R. 

Before stating with this tutorial, make sure you have a running instance on MonetDB to which you can connect. As such, the MonetDB/SQL tutorial is a recommended reading, even more, given that this page was adapted from the SQL tutorial.

To start, in case you have not done it yet, create a DB farm using the MonetDB daemon monetdbd and a new database called 'voc'. The daemon is controlled by the application monetdb. See its documentation for all details. Note: The monetdbd is not available on Windows. For that reason, on these systems a default database called demo is available by default (Windows-specific MonetDB tutorial is available here). 

From that moment on, you can start your favourite R shell, setup a MonetDB.R connection and process your data in R.
 
Now, install MonetDB.R from your R shell.
> install.packages("MonetDB.R")

At this time you will be prompted to select a mirror.

After MonetDB.R is installed you must load it.

> library(MonetDB.R)
Loading required package: DBI
Loading required package: digest

Now, let's create a connection to the database.

Note that by default MonetDB runs on port 50000, but MonetDB.R already knows that, thus there is no need to set the port. Also, in the tutorial we've use the demo database, which is the default on Windows.

> conn <- dbConnect(MonetDB.R(), host="localhost", dbname="demo", user="monetdb", password="monetdb")

Let's execute a dummy 'hello world' SQL query and in our database directly from R.

> dbGetQuery(conn,"SELECT 'hello world'")
  single_value
1  hello world

To illustrate the use of the newly created account and schema, the following example connects and creates a table, which is dropped afterwards by aborting the transaction.

> dbTransaction(conn)
> dbSendUpdate(conn,"CREATE TABLE test (id int, data varchar(30))")
> dbListTables(conn)
[1] "test"
> dbExistsTable(conn,"test")
[1] TRUE
> dbRollback(conn)
[1] TRUE
> dbListTables(conn)
character(0)

For the purpose of the tutorial we have prepared a TSV (tab-separated values) version of the VOC dataset, which contains data for around 8000 voyages. After unpacking the .zip you can import the data into the directly from R.
Using the monetdb.read.csv function we can load the specified file as a new table 'craftsmen', specifying the upper bound on rows to import (2500), the delimiter (\t). The last parameter indicates that at least 1474 rows need to be read, before before the function can safely auto-deretmine the datatype of the columns.

> file <- "voc_craftsmen.tsv"
> monetdb.read.csv(conn, file, "craftsmen", 2350, delim="\t", nrow.check=1474)
voc_craftsmen.tsv /Users/monetdb/voc_craftsmen.tsv 
    L1
1 2348

Now, let's repeat the same for the other files in the dataset.

> file <- "voc_impotenten.tsv"
monetdb.read.csv(conn, file, "impotenten", 1000, delim="\t", nrow.check=160)
file <- "voc_invoices.tsv"
monetdb.read.csv(conn, file, "invoices", 4000, delim="\t", nrow.check=1000)
file <- "voc_passengers.tsv"
monetdb.read.csv(conn, file, "passengers", 3000, delim="\t", nrow.check=1000)
file <- "voc_seafarers.tsv"
monetdb.read.csv(conn, file, "seafarers", 4500, delim="\t", nrow.check=4500)
file <- "voc_soldiers.tsv"
monetdb.read.csv(conn, file, "soldiers", 4500, delim="\t", nrow.check=4000)
file <- "voc_total.tsv"
monetdb.read.csv(conn, file, "total", 2500, delim="\t", nrow.check=2000)
file <- "voc_voyages.tsv"
monetdb.read.csv(conn, file, "voyages", 8200, delim="\t", nrow.check=5000)

dplyr

Next, let's install and load dplyr, which is designed to facilitate efficient datasets manipulation in R.

> install.packages("dplyr")
> library(dplyr)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
    filter, lag
The following objects are masked from ‘package:base’:
    intersect, setdiff, setequal, union

Make a new connection to the database for dplyr, specifying that the source is a MonetDB instance.

> monetdb_conn <- src_monetdb("demo")

Now, lets create non-matirialized views for all tables from the database we are going to need later in R. Note that none of these will actaully move data from MonetDB into R, rather we are just creating a reference to the tables.

> craftsmen <- tbl(monetdb_conn, "craftsmen")
impotenten <- tbl(monetdb_conn, "impotenten")
invoices <- tbl(monetdb_conn, "invoices")
passengers <- tbl(monetdb_conn, "passengers")
seafarers <- tbl(monetdb_conn, "seafarers")
soldiers <- tbl(monetdb_conn, "soldiers")
total <- tbl(monetdb_conn, "total")
voyages <- tbl(monetdb_conn, "voyages")

Let's check out the data we just loaded. First, let's see the dimensions and the column names of the voyages table.

> dim(voyages)
[1] 8131   22
> colnames(voyages)
 [1] "number"            "number_sup"        "trip"             
 [4] "trip_sup"          "boatname"          "master"           
 [7] "tonnage"           "type_of_boat"      "built"            
[10] "bought"            "hired"             "yard"             
[13] "chamber"           "departure_date"    "departure_harbour"
[16] "cape_arrival"      "cape_departure"    "cape_call"        
[19] "arrival_date"      "arrival_harbour"   "next_voyage"      
[22] "particulars"      

As you can see, the respose is actaully an R array.

Since we have more than 8000 rows, let's have a look at the first few.

> head(voyages)

If you are interested in what queries are executed in MonetDB you can turn on the query debug

> options(monetdb.debug.query=T)
> head(voyages)
QQ: 'SELECT "number", "number_sup", "trip", "trip_sup", "boatname", "master", "tonnage", "type_of_boat", "built", "bought", "hired", "yard", "chamber", "departure_date", "departure_harbour", "cape_arrival", "cape_departure", "cape_call", "arrival_date", "arrival_harbour", "next_voyage", "particulars"
FROM "voyages"
LIMIT 6'
QQ: Query result for query 29 with 6 rows and 22 cols, 6 rows.

As you can see, the connector uses the native LIMIT SQL function for the head query.

Disable the debug view for now.

> options(monetdb.debug.query=F)

Let's see how many impotenten we have in the dataset:

> summarise(impotenten, count = n())

   count
1  938
.. ...

It is possible to play with the set in many ways, to find out several things that took place during the voyages of the ships, or the money that was earned. A few examples are shown below:

> invoices_grouped_per_chamber <- group_by(invoices, chamber)
> average_invoices_per_chamber <- summarise(invoices_grouped_per_chamber, average = mean(invoice))
> arrange(average_invoices_per_chamber, desc(average))
Arrange: desc(average) 

   chamber  average
1        A 282996.6
2        Z 259300.5
3        H 150182.4
4        R 149628.3
5        D 149522.8
6        E 149518.0
7       NA  83309.0
..     ...      ...

See also

If you want to lear more about MonetDB.R connector or the dplyr, follow the links below:

The MonetDB also supports embedded R queries in the database. This feature was introduced in the Oct2014 release. Check out this blog post for more information.

About the VOC dataset

VOCThe data for the tutorial is published in the book J.R. Bruijn, F.S. Gaastra and I. Schaar Dutch-Asiatic Shipping in the 17th and 18th Centuries, which gives an account of the trips made to the East and ships returned safely (or wrecked on the way) by the VOC (or Dutch East India Company). A total of 8000 records are provided. They include information about ship name and type, captain, the arrival/departure of harbors along the route, personnel accounts, and anecdotal information. You can find more about the VOC on Wikipedia.

Getting Started

Getting Started giulia Mon, 02/24/2020 - 16:59

Now that we have a working setup for MonetDB, it's time to start using it!

In this tutorial we are going to cover the basics for data loading and operations with MonetDB

Create a Database

Create a Database mk Tue, 02/25/2020 - 10:22

 VOCThe MonetDB/SQL tutorial is based on the material published in the book J.R. Bruijn, F.S. Gaastra and I. Schaar, Dutch-Asiatic Shipping in the 17th and 18th Centuries, which gives an account of the trips made to the East and ships that returned safely (or wrecked on the way). A total of 8000 records are provided. They include information about ship name and type, captain, the arrival/departure of harbors along the route, personnel accounts, and anecdotal information.

The first step to use MonetDB on Linux is starting daemon monetdbd, code name meriovingian. It runs in the background and facilitates, amongst others, management of local/remote servers.  The daemon is controlled by the application monetdb. See its documentation for all details. The monetdbd is not available on Windows, but for all these systems a default database called demo is available by default (you can find the Windows tutorial here). 
 
Here we demonstrate a simple session involving creation of a database, loading data, and querying. 
 
________________________________________________________________________________________________
shell> monetdbd create /path/to/mydbfarm
shell> monetdbd start /path/to/mydbfarm
shell> monetdb create voc
shell> monetdb release voc
shell> mclient -u monetdb -d voc
password:<monetdb>
Welcome to mclient, the MonetDB/SQL interactive terminal (Oct2014-SP1)
Database: MonetDB v11.19.7 (Oct2014-SP1), 'mapi:monetdb://localhost:50000/voc'
Type \q to quit, \? for a list of available commands
auto commit mode: on
sql>SELECT 'hello world';
+---------------+
| single_value  |
+===============+
| hello world   |
+---------------+
1 tuple (0.530ms)
sql>\q
_________________________________________________________________________________________

The command \q or end-of-file <Ctrl-d> signal terminates the connection with the server.

Create Users and Schemas

Create Users and Schemas mk Tue, 02/25/2020 - 10:47

In order to explore the wealth of functionality offered by MonetDB/SQL, the best strategy is to get started using a toy database. For this we use the VOC database which provides a peephole view into the administrative system of an early multi-national company, the Vereenigde geoctrooieerde Oostindische Compagnie (VOC for short - The (Dutch) East Indian Company) established on March 20, 1602.

First thing we need to do is download the dataset. The VOC dataset can be downloaded here voc_dump.zip (542K) gz (519K) bz2 (371K) and it is a compressed file with SQL statements.

After the file has been extracted, load its contents into MonetDB using mclient. Before you load the VOC data set, it is advised to introduce a user different from the omnipresent default monetdb.

The new user is given his own schema for the MonetDB database. Assuming you have started MonetDB with SQL module, proceed as follows:

_________________________________________________________________________________________

shell> mclient -u monetdb -d voc
password:<monetdb>
sql>CREATE USER "voc" WITH PASSWORD 'voc' NAME 'VOC Explorer' SCHEMA "sys";
sql>CREATE SCHEMA "voc" AUTHORIZATION "voc";
sql>ALTER USER "voc" SET SCHEMA "voc";
sql>\q

_________________________________________________________________________________________

To illustrate the use of the newly created account and schema, the following example connects and creates a table, which is dropped afterwards by aborting the transaction.

_________________________________________________________________________________________

shell> mclient -u voc -d voc
password:<voc>
sql>START TRANSACTION;
auto commit mode: off

sql>CREATE TABLE test (id int, data varchar(30));
operation successful (0.722ms)

sql>\d
TABLE  voc.test

sql>\d test
CREATE TABLE "voc"."test" (
    "id"   INTEGER,
    "data" VARCHAR(30)
);
sql>CREATE TABLE "voc"."test" ("id" int, "data" varchar(30));
sql>ROLLBACK;
auto commit mode: on

sql>\d

_________________________________________________________________________________________

Load Data

Load Data mk Tue, 02/25/2020 - 10:40

It is possible to populate the database importing data from a file. The voc_dump.sql file can be imported into the database using the textual client interface. Some alternative ways are as follows:

_________________________________________________________________________________________

1: shell> mclient -u voc -d voc voc_dump.sql
password:<voc>
2:shell> mclient -u voc -d voc < voc_dump.sql
password:<voc>
3:shell> mclient -u voc -d voc
password:<voc>
sql> \< voc_dump.sql
___________________________________________________________________________________

 

After the import procedure is complete, we can access it through the mclient interface. The VOC data set contains data for around 8000 voyages.

___________________________________________________________________________________
sql>\d
TABLE  voc.craftsmen
TABLE  voc.impotenten
TABLE  voc.invoices
TABLE  voc.passengers
TABLE  voc.seafarers
TABLE  voc.soldiers
TABLE  voc.total
TABLE  voc.voyages

_________________________________________________________________________________________

Let's check out the data we just loaded. First, count the number of rows the voyages table.

_________________________________________________________________________________________

sql>SELECT count(*) FROM voyages;
+------+
| L1   |
+======+
| 8131 |
+------+

_________________________________________________________________________________________

In the next section we are going to show examples on how to play with views and queries

SQL Primer

SQL Primer giulia Tue, 02/18/2020 - 11:44

In this section we are going to give an overview on basic operations that can be performed on a dataset using MonetDB

The SQL Language

The SQL Language giulia Mon, 03/16/2020 - 14:30

MonetDB offers full support for SQL syntax. For this reason, the main mean of interaction with the data is through SQL statements. 

Under construction

Create Views

Create Views mk Mon, 03/16/2020 - 15:26

The set consists of 8 tables, which are all bound to each other using FOREIGN KEY relationships. The voyages table is the main table, which all others refer to. Every table, except invoices has a PRIMARY KEY defined over the columns number and number_sup. Since the invoices table holds zero or more invoices per voyage (identified by number, number_sub) a PRIMARY KEY constraint is not possible. The tables craftsmen, impotenten, passengers, seafarers, and soldiers all share the same columns. We can define a VIEW that combines them all into one big table, to make them easier to access.

_________________________________________________________________________________________

sql>CREATE VIEW onboard_people AS
SELECT * FROM (
SELECT 'craftsmen' AS type, craftsmen.* FROM craftsmen
UNION ALL
SELECT 'impotenten' AS type, impotenten.* FROM impotenten
UNION ALL
SELECT 'passengers' AS type, passengers.* FROM passengers
UNION ALL
SELECT 'seafarers' AS type, seafarers.* FROM seafarers
UNION ALL
SELECT 'soldiers' AS type, soldiers.* FROM soldiers
UNION ALL
SELECT 'total' AS type, total.* FROM total
) AS onboard_people_table;

_________________________________________________________________________________________

The new view will show up and we can just use it as a normal table:

_________________________________________________________________________________________

sql>\d
TABLE  voc.craftsmen
TABLE  voc.impotenten
TABLE  voc.invoices
VIEW   voc.onboard_people
TABLE  voc.passengers
TABLE  voc.seafarers
TABLE  voc.soldiers
TABLE  voc.total
TABLE  voc.voyages

_________________________________________________________________________________________

Simple Queries

Simple Queries mk Mon, 03/16/2020 - 15:28

With the view that we just created, we can for instance calculate the number of records for each group of people:

_________________________________________________________________________________________

sql>SELECT type, COUNT(*) AS total 
FROM onboard_people GROUP BY type ORDER BY type;
+------------+-------+
| type       | total |
+============+=======+
| craftsmen  |  2349 |
| impotenten |   938 |
| passengers |  2813 |
| seafarers  |  4468 |
| soldiers   |  4177 |
| total      |  2467 |
+------------+-------+

_________________________________________________________________________________________

Let's see how many impotenten we have in the dataset:

_________________________________________________________________________________________

sql>SELECT count(*) FROM impotenten;
+------+
| L1   |
+======+
|  938 |
+------+

_________________________________________________________________________________________

It is possible to play with the set in many ways, to find out several things that took place during the voyages of the ships, or the money that was earned.

A few examples are shown below:

_________________________________________________________________________________________

sql>SELECT COUNT(*) FROM voyages
WHERE particulars LIKE '%_recked%';
+------+
| L1   |
+======+
|  358 |
+------+
sql>SELECT chamber, CAST(AVG(invoice) AS integer) AS average
FROM invoices
WHERE invoice IS NOT NULL
GROUP BY chamber
ORDER BY average DESC;
+---------+---------+
| chamber | average |
+=========+=========+
| A       |  282996 |
| Z       |  259300 |
| H       |  150182 |
| R       |  149628 |
| D       |  149522 |
| E       |  149518 |
| null    |   83309 |
+---------+---------+
sql>CREATE VIEW extended_onboard AS
SELECT number, number_sup, trip, trip_sup,
    onboard_at_departure, death_at_cape,
    left_at_cape, onboard_at_cape,
    death_during_voyage, onboard_at_arrival,
    death_during_voyage - left_at_cape AS death_at_arrival
FROM onboard_people;
sql>\d
TABLE  voc.craftsmen
VIEW   voc.extended_onboard
TABLE  voc.impotenten
TABLE  voc.invoices
VIEW   voc.onboard_people
TABLE  voc.passengers
TABLE  voc.seafarers
TABLE  voc.soldiers
TABLE  voc.total
TABLE  voc.voyages

_________________________________________________________________________________________

Advanced Features

Advanced Features mk Tue, 02/18/2020 - 14:06

MonetDB has been designed to use the available resources without many tuning knobs. Functional enhancements are collected in autoloading scripts, which are executed when a database is created.

Other resources are mostly gathered upon need from the operating system environment. MonetDB aims to use as much of the main memory available, as many cores as can be practically deployed in parallel processing of queries, and trying to avoid going to a slow disk. As such, it is a rather aggressive system, which find immediate competition with ongoing heavy demand applications, e.g. a browser with a ton of open windows.

We recommend to use MonetDB in production settings on a dedicated server or a sizeable Cloud server.

Autoloading Scripts

Autoloading Scripts mk Mon, 05/04/2020 - 14:08

Autoloading  is used to initialize a new SQL catalog with scripts containing tables, functions, and procedures while creating a new database with monetdb. Such scripts are kept in the MonetDB library area ./lib/monetdb5/createdb/ . Look them up for the details. The prefix number is used to direct the order of their execution.  A summary of the current collection is illustrated below

Script Description
09_like.sql String regular pattern matching using PCRE
10_math.sql Polar arithmetic
12_url.sql URL manipulations (see manual)
13_date.sql POSIX formatted date coercions ((see man strptime/strftime)
14_inet.sql PostgreSQL inet data type (see manual)
15_querylog.sql Query history management (see manual)
16_tracelog.sql Query trace table (see manual)
17_temporal.sql Additional date, time and timestamp functions
18_index.sql Experimental Ordered index functions
20_vacuum.sql Table vacuum commands (experimental!)
21_dependency_views.sql Schema object dependencies analysis
22_clients.sql Client administration procedures
23_skyserver.sql Application support library
25_debug.sql Debugging tools for optimziers
26_sysmon.sql Active SQL query queue management
27_rejects.sql COPY into reject management
39_analytics.sql
39_analytics_hge.sql
Analytic functions
40_geom.sql Optional Geometry extension library (see manual)
40_json.sql
40_json_hge.sql
JSON data type support functions
41_md5sum.sql md5 checksum function
45_uuid.sql UUID data type functionality
46_profiler.sql start/stop collecting MonetDB profiler traces in a profiler_logs pool with a specific heartbeat
51_sys_schema_extension.sql Additional tables / views in "sys" schema.
60_wlcr.sql Workload Capture and Replay functions.
72_fits.sql Optional Flexible Image Transport System (FITS) files extension library
(see astronomical-data-analysis-monetdb-data-vaults)
74_netcdf.sql Optional Network Common Data Form (NetCDF) files extension library
(see Extensions/DataVaults)
75_lidar.sql Optional LiDAR data files extension library
75_shp.sql Optional SHP files extension library
75_storagemodel.sql Storage footprint analysis and prediction.
80_statistics.sql Analyze tables for optimizer statistics
80_udf.sql
80_udf_hge.sql
Example C UDF functions.
85_bam.sql Optional BAM/SAM (Sequence Alignment/Map) format files extension library
(see BAM/SAM module)
90_generator.sql
90_generator_hge.sql
Table producing functions to generate series of numbers or timestamps
99_system.sql System administration (internal)

Along the same line, MonetDB library area ./lib/monetdb5/autoload/ contains MAL scripts to be executed each time the server is started.

Data Compression

Data Compression mk Mon, 05/04/2020 - 14:43

A distinctive feature of column stores is to apply aggressive data compression. However, compression is often a two-edged sword, where movement of large data files over relative slow networks, disk access or memory interconnects is compensated for by applying CPU cycles. The effectiveness of which strongly depends on the mapping from database schema into data structures,  their high maintenance cost, the relational algorithms,  the system architecture, and the data distribution. The compression ratios cited  depends on the input size, which is commonly assumed to be in CSV format, the data distribution, and the database storage footprint, with or without auxiliary data structures like indices.

MonetDB applies different compression techniques automatically at many levels.

The column store representation is highly optimized, where the basic storage structure is a dense array, i.e. without holes to accommodate future insertions or overhead caused by the data structure itself (e.g. B-trees). This dense representation allows for direct mapping of the database files into memory. The storage width ranges from 1 (byte) to 8 bytes (doubles).  NULL values are part of the domain space, which avoids auxiliary bit masks at the expensive of 'loosing' a single value from the domain.

All strings are stored using dictionary encoding. This significantly reduces their storage space, but with larger dictionaries the maintenance cost may become expensive. Therefore for really large dictionary tables, MonetDB resort to non-compressed string representation. The references into the dictionary table occupy anywhere from 1 to 8 bytes, depending on the number of elements.

During query evaluation, a dense range of results is represented by a column view. This is a small footprint representation of the result set. It avoids both copying the result and storing it in its private column structure.

Disk Space

Disk Space mk Mon, 05/04/2020 - 14:44

The disk space footprint is determined by the way columns are being stored. MonetDB uses dictionary encoding for string columns, but aside from this there is no default compression applied to reduce the disk footprint. The prime reason being the two-headed sword of compression. It saves (cheap) disk space and IO bandwidth at the cost of expensive CPU (de)compression overhead (See compression). Since all columns are memory mapped upon access, i.e. they need not be decompressed.  If disk space comes at a premium and memory residency can be guaranteed for a long time, then a compressed file system, e.g. BTRFS, may become helpful. Its compression behavior is often as good as  dedicated algorithms executed within the critical path of a query execution.

The disk footprint can be assessed using the (Linux) command 'du' on the dbfarm directory or to run the query 'select * from storage();', provided the sql extensions are pre-loaded into your database. (See storage model)

Running out of diskspace

One of the features of MonetDB's execution model is that all intermediates are materialized as memory mapped files. A consequence of this approach is that when memory is too small to keep them around, they will be swapped to disk by the operating system. This can be seen as a decaying free space and ultimately a full disk. In turn this (should) lead to a single transaction abort and removing its disk claim. Evidently, filling your disk depends on the number of users and the complexity of their queries. It aids to the cost of running COPY INTO and queries concurrently.

If you have limited resources for concurrent access then the monetdb funnel may be an option to serialize the user requests.  A single server can have multiple funnels.

Memory Footprint

Memory Footprint mk Mon, 05/04/2020 - 14:10

MonetDB requires all data that needs to be active at any given point in time to fit into the address space --- and of course to fit on the storage device, (i.e., your disk system). On 32-bit systems, the address space is at most 32-bit (4 GB); in practice, it is actually limited to 3 GB or even 2 GB on most systems. On 64-bit systems, the address space can theoretically be 64-bit, but in practice is often "limited" to 48-bit or so --- not that that makes any difference ...

MonetDB excessively uses main memory for processing, but does not require that all data fit in the available physical memory. To handle a dataset that exceeds the available physical memory, MonetDB does not (only) rely on the available swap space, but (also) uses memory-mapped files to exploit disk storage beyond the swap space as virtual memory.

For example, while bulk-loading data (preferably using a COPY INTO statement from a (possibly compressed) CSV file), MonetDB needs to have all columns of the table that is currently being loaded "active", i.e., accessible in the address space. However, during loading, parts of the data are continuously written to the persistent files on disk, i.e., the whole table does not have to fit into main memory. E.g., loading a 100 GB table works fine on a system with 8 GB RAM and 16 GB swap -- provided there is sufficient free disk space.

During query processing, MonetDB requires for each single MAL operation during the query execution that all its inputs, its outputs, and possible temporary data structures fit in the address space. This won't be a problem on 64-bit machines, but may be experienced if you are close to the limit on 32-bit machines. MonetDB automatically resorts to virtual memory and memory-mapped files for large intermediate results. Also (large) persistent tables are accessed using memory mapping.

While running, you might see your mserver5 process' virtual size grow well beyond the available physical memory and possibly also well beyond your swap space.  In principle, this is not a problem at all.  Most of this virtual size is due to the fact that large base tables (or intermediate results) that reside as files on disk are memory-mapped into the address space. Those parts of the data that are currently not accessed do not consume any physical memory (except possibly for caching purposes).

However, if individual columns of your table(s) and/or individual columns of intermediate results exceed the size of the available physical memory, the performance of MonetDB might (will) decrease due to increased I/O requirements.

Multicore Usage

Multicore Usage mk Mon, 05/04/2020 - 14:12

Modern systems come with many CPUs on a single die and further augmented with hyperthreading to increase parallel processing. MonetDB will use as much of the cores as possible for both inter- and intra- parallelism.

A command line argument gdk_nr_threads=<number> can be set to  experiment with a limited number of cores assigned to each user session.

OOM Killer

OOM Killer mk Mon, 05/04/2020 - 14:46

The MonetDB server may become a victim of Linux kernel functionality  called Out Of Memory Killer (or OOM Killer) responsible for dealing with excessive memory requirements.
If the system reaches a point where it may  run out of memory, OOM Killer looks for victim process and ends its life the hard way.

In most cases, a simple restart of the server will suffice, but when the killer enters during a complex update transaction then the database may end-up in an inconsistent state.

Troubleshooting

Troubleshooting giulia Tue, 02/18/2020 - 16:51

Throughout your experience with MonetDB, you might incur in bugs and issues that are not expected. 

While we cannot give a solution to that right away, there are some tools that can help the debugging and tracing process to ease yoor experience in fixing issues.

Debugging

Debugging mk Mon, 03/02/2020 - 14:07

In most cases, the system produces  informative error messages. However, there are situations where MonetDB enters an area not covered by the test suite or previous use and a segmentation fault occurs. These cases are hard to analyze outside the development lab. To isolate and resolve the issue we need at least the following information.

Sent us the result of the command mserver5 --version --dbname=<databasename> or the equivalent using monetdb --version <databasename>

Is the error reproducible with a small (5-10 line) script/query/program? Trim your experiment to the minimal size that demonstrates the erroneous behavior. Such a script is the best we can hope for, because it will end up in the nightly testing.

In addition, follow the steps, assuming that you are logged onto the same Linux (!) machine as where the server will run:

  1. Sent the tail of the merovingian log (100 lines) located in the dbfarm directory.
  2. Start the database server either with monetdb  or as a full command line mserver --dbinit="..." .
  3. Lookup the process id for this server using the command  ps -al |grep mserver5
  4. Attach the debugger with gdb mserver5 <processid>
  5. Run your experiment again until it hits the segmentation fault, which you see emerging in window.
  6. Then obtain a stack trace with the command where. It shows of the context of the crash using the list command and the variables used in the context of the crash.
  7. Collect the status of all threads using the command thr app all bt.

 

Tracer

Tracer giulia Mon, 03/02/2020 - 14:08

Under construction