Science Library

MonetDB has been used as a basis for database research at CWI since 1993. Many people have contributed in terms of both useful code and user experiences. Their science results are readily available through Google Scholar. A selection of key publications summarized below provide a good starting point for becoming acquainted with the design considerations and these experiences. The bibtex record is provided for ease of referencing our work in your science papers.

Key Publications

Overview papers

Stratos Idreos, Fabian Groffen, Niels Nes, Stefan Manegold, K. Sjoerd Mullender, Martin L. Kersten: MonetDB: Two Decades of Research in Column-oriented Database Architectures. IEEE Data Eng. Bull. 35(1): 40-45 (2012)

Peter A. Boncz, Martin L. Kersten, Stefan Manegold: Breaking the memory wall in MonetDB. Commun. ACM 51(12): 77-85 (2008)

Indexing

Column imprint, a simple but efficient cache conscious secondary index. A column imprint is a collection of many small bit vectors, each indexing the data points of a single cacheline. An imprint is used during query evaluation to limit data access and thus minimize memory traffic. The compression for imprints is cpu friendly.

Lefteris Sidirourgos, Martin L. Kersten: Column imprints: a secondary index structure. SIGMOD Conference 2013: 893-904

Point Clouds

MonetDB has been used to handle multi billion point clouds, such as the Dutch height map at a resolution of 3 cm. A benchmark has been designed to assess progress in technology for this high demanding application.

Oscar Martinez-Rubi, Peter van Oosterom, Romulo Goncalves, Theo Tijssen, Milena Ivanova, Martin L. Kersten, Foteini Alvanaki: Benchmarking and improving point cloud data management in MonetDB. SIGSPATIAL Special 6(2): 11-18 (2014)

Cluster Analysis

Blaeu is a novel solution to visual analytics where the key challenge is to guide the user in exploring datasets without pre-knowledge.

Thibault Sellam, Martin L. Kersten: Cluster-Driven Navigation of the Query Space. IEEE Trans. Knowl. Data Eng. 28(5): 1118-1131 (2016)

Innovations

The MonetDB team received the VLDB 10-year Best Paper award for their contribution to the database community. The invitation paper to mark this occasion and the related Communication of the ACM publication provide a condensed summary of the key ideas within the research.

Peter A. Boncz, Stefan Manegold, Martin L. Kersten: Database Architecture Evolution: Mammals Flourished long before Dinosaurs became Extinct. PVLDB 2(2): 1648-1653 (2009).

Peter A. Boncz, Martin L. Kersten, Stefan Manegold: Breaking the memory wall in MonetDB.Communications ACM 51(12): 77-85 (2008)

Query Recycling

The SIGMOD 2009 runner-up award was received for a novel way to deal with intermediate results during query processing. Its deployment against the Sloane Digital Sky Survey database illustrated its capabilities to capture materialized views automatically with a four-fold throughput improvement.

Milena Ivanova, Martin L. Kersten, Niels J. Nes, Romulo Goncalves: An architecture for recycling intermediates in a column-store. ACM Trans. Database Systems. 35(4): 24 (2010).

Database Cracking

Database cracking is a technique that shifts the cost of index maintenance from updates to query processing. It is an area that challenges the software stack, i.e. using optimizers to massage the query plans to crack and to propagate this information to improved response times further. It received the ACM SIGMOD 2011 J.Gray best dissertation award.

Stratos Idreos, Martin L. Kersten, Stefan Manegold: Self-organizing tuple reconstruction in column-stores. SIGMOD Conference 2009: 297-308.

Stratos Idreos, Martin L. Kersten, Stefan Manegold: Updating a cracked database. SIGMOD Conference 2007: 413-424

Data Cyclotron

Running a distributed database system is often the way to improve response time or throughput. The latter is the focus of the Data Cyclotron project, which assumes that the database hot-set is small enough to perpetually pass it through the network to visit all nodes repeatedly.

Romulo Goncalves, Martin L. Kersten: The Data Cyclotron query processing scheme. ACM Trans. Database Systems 36(4):2011. The original paper was published in EDBT 2010: 75-86.

DataCell

Continuous query processing over streams is a longstanding challenge. High-volume rates do not require a complete system design from scratch. Proper use of the bulk operators in a relational engine, combined with an optimizer to produce incremental plans gives you the required functionality and speed.

Erietta Liarou, Romulo Goncalves, Stratos Idreos: Exploiting the power of relational databases for efficient stream processing. EDBT 2009: 323-334.

SciBORQ

Query processing over large science databases needs improvement. At any given time only a fraction of the data is of primary value for a specific task. This fraction becomes the focus of scientific reflection through an iterative process of ad-hoc query refinement. Steering through data to facilitate scientific discovery demands guarantees for the query execution time.

Lefteris Sidirourgos, Martin L. Kersten, Peter A. Boncz: SciBORQ: Scientific data management with Bounds On Runtime and Quality. CIDR 2011: 296-301

OVERVIEW papers
Stratos Idreos, Fabian Groffen, Niels Nes, Stefan Manegold, K. Sjoerd Mullender, Martin L. Kersten: MonetDB: Two Decades of Research in Column-oriented Database Architectures. IEEE Data Eng. Bull. 35(1): 40-45 (2012)Peter A. Boncz, Martin L. Kersten, Stefan Manegold: Breaking the memory wall in MonetDB. Commun. ACM 51(12): 77-85 (2008)
INDEXING Column imprint, a simple but efficient cache conscious secondary index. A column imprint is a collection of many small bit vectors, each indexing the data points of a single cacheline. An imprint is used during query evaluation to limit data access and thus minimize memory traffic. The compression for imprints is cpu friendly.POINT CLOUDS MonetDB has been used to handle multi billion point clouds, such as the Dutch height map at a resolution of 3 cm. A benchmark has been designed to assess progress in technology for this high demanding application.CLUSTER ANALYSIS Blaeu is a novel solution to visual analytics where the key challenge is to guide the user in exploring datasets without pre-knowledge.
Lefteris Sidirourgos, Martin L. Kersten: Column imprints: a secondary index structure. SIGMOD Conference 2013: 893-904Oscar Martinez-Rubi, Peter van Oosterom, Romulo Goncalves, Theo Tijssen, Milena Ivanova, Martin L. Kersten, Foteini Alvanaki: Benchmarking and improving point cloud data management in MonetDB. SIGSPATIAL Special 6(2): 11-18 (2014)Thibault Sellam, Martin L. Kersten: Cluster-Driven Navigation of the Query Space. IEEE Trans. Knowl. Data Eng. 28(5): 1118-1131 (2016)
INNOVATIONS The MonetDB team received the VLDB 10-year Best Paper award for their contribution to the database community. The invitation paper to mark this occasion and the related Communication of the ACM publication provide a condensed summary of the key ideas within the research.QUERY RECYCLING The SIGMOD 2009 runner-up award was received for a novel way to deal with intermediate results during query processing. Its deployment against the Sloane Digital Sky Survey database illustrated its capabilities to capture materialized views automatically with a four-fold throughput improvement.DATABASE CRACKING Database cracking is a technique that shifts the cost of index maintenance from updates to query processing. It is an area that challenges the software stack, i.e. using optimizers to massage the query plans to crack and to propagate this information to improved response times further. It received the ACM SIGMOD 2011 J.Gray best dissertation award.
Peter A. Boncz, Stefan Manegold, Martin L. Kersten: Database Architecture Evolution: Mammals Flourished long before Dinosaurs became Extinct. PVLDB 2(2): 1648-1653 (2009).Milena Ivanova, Martin L. Kersten, Niels J. Nes, Romulo Goncalves: An architecture for recycling intermediates in a column-store. ACM Trans. Database Systems. 35(4): 24 (2010).Stratos Idreos, Martin L. Kersten, Stefan Manegold: Self-organizing tuple reconstruction in column-stores. SIGMOD Conference 2009: 297-308.
Peter A. Boncz, Martin L. Kersten, Stefan Manegold: Breaking the memory wall in MonetDB.Communications ACM 51(12): 77-85 (2008)Stratos Idreos, Martin L. Kersten, Stefan Manegold: Updating a cracked database. SIGMOD Conference 2007: 413-424
DATA CYCLOTRON. Running a distributed database system is often the way to improve response time or throughput. The latter is the focus of the Data Cyclotron project, which assumes that the database hot-set is small enough to perpetually pass it through the network to visit all nodes repeatedly.DATACELL. Continuous query processing over streams is a longstanding challenge. High-volume rates do not require a complete system design from scratch. Proper use of the bulk operators in a relational engine, combined with an optimizer to produce incremental plans gives you the required functionality and speed.SCIBORQ. Query processing over large science databases needs improvement. At any given time only a fraction of the data is of primary value for a specific task. This fraction becomes the focus of scientific reflection through an iterative process of ad-hoc query refinement. Steering through data to facilitate scientific discovery demands guarantees for the query execution time.
Romulo Goncalves, Martin L. Kersten: The Data Cyclotron query processing scheme. ACM Trans. Database Systems 36(4):2011. The original paper was published in EDBT 2010: 75-86.Erietta Liarou, Romulo Goncalves, Stratos Idreos: Exploiting the power of relational databases for efficient stream processing. EDBT 2009: 323-334.Lefteris Sidirourgos, Martin L. Kersten, Peter A. Boncz: SciBORQ: Scientific data management with Bounds On Runtime and Quality. CIDR 2011: 296-301