Design Overview

General overview

MonetDB belongs to the class of database management systems designed primarily for data warehouse environments. They are characterised by large repositories, which are mostly queried for business intelligence decisions. They also appear frequently in the area of science, where observations are collected into a data warehouse for subsequent scientific analysis. The design is focussed on bulk processing wherever possible. Aimed at maximal efficient use of the hardware for large scale processing. This design objective is reflected at all levels of the architecture and the functionality offered to the user. Although MonetDB/SQL provides a full-fledged SQL interface, it has not been tuned towards high-volume transaction processing with its accompanying multi-level ACID properties. However, its transaction processing capability is on par with its peers in the open-source world.

The storage layer combined with the MAL algebra provides a flexible architecture to accomodate a wide variety of query languages. The relational front-end decomposes tables by column with a dense (non-stored) OID head, and a tail column with values. For each relational table, a column with deleted positions is kept around. Deltas are designed to delay updates to the main columns, and allow a relatively cheap snapshot isolation mechanism. MonetDB/SQL also maintains additional join indices to speed-up foreign key join predicates; and value indices are created on-the-fly.

Software Stack

MonetDB's query processing scheme is centered around three software layers. The top is formed by the query language parser and a heuristic, language - and data model - specific optimizer to reduce the amount of data produced by intermediates and to exploit semantic catalogue knowledge, such as foreign key constraints. The output is a logical plan expressed in the MonetDB Assembly language, called MAL.

The second tier consists of a collection of optimizer modules, which are assembled into an optimization pipeline. Each optimizer takes a MAL plan and transforms it into a more efficient one, optionally sprinkled with resource management and flow of control directives. The modules provide facilities ranging from symbolic processing up to just-in-time data distribution and execution. The approach breaks with the hitherto omnipresent cost-based optimizers by recognition that not all decisions can be cast together in a single cost formula upfront. Despite the precision of the database statistucs gathered, Choosing the best algorithm can best be postponed until you need the results and have knowledge on the properties of its arguments.

The third tier, the MAL interpreter, contains the library of highly optimized implementation of the binary relational algebra operators. They maintain properties over the object accessed to select the best algorithms. For example, the Select operator can benefit both from sorted-ness of the BAT or it may call for a sample to derive the expected sizes. The interface is described in the MAL module sections.

Execution Model

The MonetDB kernel is an abstract machine, programmed in the MonetDB Assembly Language (MAL). The language is centered around simple relational operators which take fully materialized vectors, called BATs, and scalars. Each relational algebra operator maps to a simple MAL instruction, which has zero degrees of freedom; it does not take complex expressions as parameter, just variable names. Complex expressions are broken down into a sequence of MAL operators that each perform a simple operation on an entire column of values ("bulk processing").

For instance, the relational BAT algebra expression

R:bat[:oid]:=select(B:bat[:int], V:int);

can be implemented at the C code level like:

for (i = j = 0; i < n; i++)
    if (B.tail[i] == V)
        R.tail[j++] = i;

The BAT algebra operators have the advantage that tight for-loops create highly efficient data access and instruction locality. Such simple loops are amenable to compiler optimization (loop pipelining, vectorization, blocking, strength reduction), and CPU out-of-order speculative execution.

Storage Model

The storage model deployed in MonetDB is a significant deviation of traditional database systems, where tables are represented as tuples organized into blocks and accessed using a block cache. Instead, it represents relational tables using vertical fragmentation, by storing each column in a separate {(OID, value)} table, also called a BAT (Binary Association Table). The complete result is always stored in (intermediate) BATs, and the result of an SQL query is a collection of BATs.

Each column, or BAT, is implemented as an ordinary C-array where the OID maps to the index in the array possibly extended with a so-called sequence-base offset. The persistent version of a BAT is a memory mapped file. This use of arrays in virtual memory exploits the fast in-hardware address to disk-block mapping implemented by the MMU (memory management unit) in a CPU to provide an O(1) positional database lookup mechanism.

To ensure fast access wherever possible, all (relational) operators exploit a small set of BAT properties maintained under updates and intermediates creation.

 seq        - the sequence base, a mapping from array index 0 into a OID value
 key        - the values in the column are unique
 nil        - there is at least one NIL value
 nonil      - it is unknown if there NIL values
 dense      - the numeric values in the column form a dense sequence
 sorted     - the column contains a sorted list for ordered domains
 revsorted  - the column contains a reversed sorted list
 minpos     - position of the first minimum value
 maxpos     - position of the first maximum value
 unique_est - estimated number of unique values

Performance Considerations

The design of the MonetDB software stack was driven by the need to reduce the effort to extend the system into new directions easily and to reduce the Total Execution Cost (TEC). TEC is what an end-user or application program will notice and is composed of several cost factors:

A) API message handling, P) Parsing and semantic analysis, O) Optimization and plan generation, D) Data access to the persistent store, E) Execution of the query terms, R) Result delivery to the application.

Choosing an architecture for processing database operations presupposes an intuition on how the cost will be distributed. In an OLTP setting you expect most of the cost to be in (P,O), while in OLAP it will be (D,E,R). In a distributed setting the components (O,D,E) are dominant. Web-applications would focus on (A,E,R).

Such a simple characterization ignores the widespread differences that can be experienced at each level. To illustrate, in D) and R) it makes a big difference whether the data is already in the cache or still on disk. With E) it makes a big difference whether you are comparing two integers, evaluating a mathematical function, e.g., Gaussian function, or evaluating a regular expression on a string. As a result, intense optimization in one area may become completely invisible due to being overshadowed by other cost factors.

The MonetDB architecture is designed to ease addressing each of these cost factors in a well-defined way, while retaining the flexibility to combine the components needed for a particular situation. It results in an architecture where you assemble the components for a particular application domain and hardware platform.

The primary interface to the database kernel is based on the exchange of text in the form of SQL queries and simply formatted results. This interface is designed for ease of interpretation and versatility, and is flexible to accommodate system debugging and application tool development. Although a textual interface potentially leads to a performance degradation, our experience with earlier system versions showed that the overhead can be kept within acceptable bounds. Moreover, a textual interface reduces the programming effort otherwise needed to develop test and application programs.