Skip to main content

Science Library

MonetDB has been used as a basis for database research at CWI since 1993. Many people have contributed in terms of both useful code and user experiences. Their science results are readily available through Google scholar or the  CWI repository. A selection of key publications summarized below provides a good starting point for becoming acquainted with the design considerations and these experiences. The bibtex record is provided for ease of referencing our work in your science papers.

OVERVIEW papers    

INDEXING Column imprint, a simple but efficient cache conscious secondary index. A column imprint is a collection of many small bit vectors, each indexing the data points of a single cacheline. An imprint is used during query evaluation to limit data access and thus minimize memory traffic. The compression for imprints is cpu friendly.

POINT CLOUDS MonetDB has been used to handle multi billion point clouds, such as the Dutch height map at a resolution of 3 cm. A benchmark has been designed to assess progress in technology for this high demanding application.


CLUSTER ANALYSIS Baeu is a novel solution to visual analytics where the key challenge is to guide the user in exploring datasets without pre-knowledge.



, :
Column imprints: a secondary index structure. SIGMOD Conference : 893-904


INNOVATIONS. The MonetDB team received the VLDB 10-year Best Paper award for their contribution to the database community. The invitation paper to mark this occasion and the related Communication of the ACM publication provide a condensed summary of the key ideas within the research.

QUERY RECYCLING. The SIGMOD 2009 runner-up award was received for a novel way to deal with intermediate results during query processing. Its deployment against the Sloane Digital Sky Survey database illustrated its capabilities to capture materialized views automatically with a four-fold throughput improvement.

DATABASE CRACKING. Database cracking is a technique that shifts the cost of index maintenance from updates to query processing. It is an area that challenges the software stack, i.e. using optimizers to massage the query plans to crack and to propagate this information to improved response times further. It received the ACM SIGMOD 2011 J.Gray best dissertation award.

Peter A. Boncz, Stefan Manegold, Martin L. Kersten: Database Architecture Evolution: Mammals Flourished long before Dinosaurs became Extinct. PVLDB 2(2): 1648-1653 (2009) [bibtex].
[2] Peter A. Boncz, Martin L. Kersten, Stefan Manegold: Breaking the memory wall in MonetDB. Communications ACM 51(12): 77-85 (2008) [bibtex]
Milena Ivanova, Martin L. Kersten, Niels J. Nes, Romulo Goncalves: An architecture for recycling intermediates in a column-store. ACM Trans. Database Systems. 35(4): 24 (2010) [bibtex]. The original paper was published in SIGMOD Conference 2009: 309-320 [bibtex]. Stratos Idreos, Martin L. Kersten, Stefan Manegold: Self-organizing tuple reconstruction in column-stores. SIGMOD Conference 2009: 297-308 [bibtex].
[6] Stratos Idreos, Martin L. Kersten, Stefan Manegold: Updating a cracked database. SIGMOD Conference 2007: 413-424 [bibtex]

DATA CYCLOTRON.  Running a distributed database system is often the way to improve response time or throughput. The latter is the focus of the Data Cyclotron project, which assumes that the database hot-set is small enough to perpetually pass it through the network to visit all nodes repeatedly.

DATACELL. Continuous query processing over streams is a longstanding challenge. High-volume rates do not require a complete system design from scratch. Proper use of the bulk operators in a relational engine, combined with an optimizer to produce incremental plans gives you the required functionality and speed.

SCIBORQ. Query processing over large science databases needs improvement. At any given time only a fraction of the data is of primary value for a specific task. This fraction becomes the focus of scientific reflection through an iterative process of ad-hoc query refinement. Steering through data to facilitate scientific discovery demands guarantees for the query execution time.

[7] Romulo Goncalves, Martin L. Kersten: The Data Cyclotron query processing scheme. ACM Trans. Database Systems 36(4):2011. The original paper was published in EDBT 2010: 75-86 [bibtex]. [8] Erietta Liarou, Romulo Goncalves, Stratos Idreos: Exploiting the power of relational databases for efficient stream processing. EDBT 2009: 323-334 [bibtex]. [9] Lefteris Sidirourgos, Martin L. Kersten, Peter A. Boncz: SciBORQ: Scientific data management with Bounds On Runtime and Quality. CIDR 2011: 296-301 [bibtex]