MonetDB architecture

MonetDB architecture mk Tue, 03/30/2010 - 11:53

General overview. MonetDB belongs to the class of database management systems designed primarilly for datawarehouse environments. They are characterised by large repositories, which are mostly queried for business intelligence decisions. They also appear frequently in the area of science, where observations are collected into a warehouse for subsequent scientific analysis. The design is focussed on bulk processing wherever possible. Aimed at maximal efficient use of the hardware for large scale processing. This design objective is reflected at all levels of the architecture and the functionality offered to the user. Although MonetDB/SQL provides a full-fledged SQL interface, it has not been tuned towards high-volume transaction processing with its accompanying multi-level ACID properties.

SQL front-end. The storage layer combined with the MAL algebra provides a flexible architecture to accomodate a wide variety of query languages.  The relational front-end decomposes tables by column with a dense (non-stored) OID head, and a tail column with values. For each relational table, a column with deleted positions is kept. Deltas  are designed to delay updates to the main columns, and allow a relatively cheap snapshot isolation mechanism. MonetDB/SQL also maintains additional join indices to speed-up foreign key join predicates; and value indices are created on-the-fly.

Design considerations

Design considerations mk Thu, 02/20/2014 - 03:53

Redesign of the MonetDB software stack was driven by the need to reduce the effort to extend the system into novel directions and to reduce the Total Execution Cost (TEC). The TEC is what an end-user or application program will notice and is composed of several cost factors:

A) API message handling
P) Parsing and semantic analysis
O) Optimization and plan generation
D) Data access to the persistent store
E) Execution of the query terms
R) Result delivery to the application
 
Choosing an architecture for processing database operations presupposes an intuition on how the cost will be distributed. In an OLTP setting you expect most of the cost to be in (P,O), while in OLAP it will be (D,E,R). In a distributed setting the components (O,D,E) are dominant. Web-applications would focus on (A,E,R).
 
Such a simple characterization ignores the widespread differences that can be experienced at each level. To illustrate, in D) and R) it makes a big difference whether the data is already in the cache or still on disk. With E) it makes a big difference whether you are comparing two integers, evaluationing a mathematical function, e.g., Gaussian, or evalauting a regular expression on a string. As a result, intense optimization in one area may become completely invisible due to being overshadowed by other cost factors.

The Version 5 infrastructure is designed to ease addressing each of these cost factors in a well-defined way, while retaining the flexibility to combine the components needed for a particular situation. It results in an architecture where you assemble the components for a particular application domain and hardware platform.

The primary interface to the database kernel is still based on the exchange of text in the form of queries and simply formatted results. This interface is designed for ease of interpretation and versatility, and is flexible to accommodate system debugging and application tool development. Although a textual interface potentially leads to a performance degradation, our experience with earlier system versions showed that the overhead can be kept within acceptable bounds. Moreover, a textual interface reduces the programming effort otherwise needed to develop test and application programs.
 

Storage model

Storage model mk Wed, 02/26/2014 - 13:03

The storage model deployed in MonetDB is a significant deviation of traditional database systems. It represents relational tables using vertical fragmentation, by  storing each column in a separate {(OID,value)} table,  also called a BAT (Binary Association Table). MonetDB relies on a low-level relational algebra called the BAT algebra, which takes BATs and scalar values as input. The complete result is always stored in (intermediate) BATs, and the result of an SQL query is a collection of BATs.

Each column, or BAT, is implemented as an ordinary C-array where the OID maps to the index in the array possibly extended with a so-called sequence-base offset. The persistent version of a BAT is a memory mapped file. OID lookup becomes a fast array indexed read into the tail column. In effect, this use of arrays in virtual memory exploits the fast in-hardware address to disk-block mapping implemented by the  MMU (memory management unit) in a CPU to provide an O(1) positional database lookup mechanism.

To ensure fast access wherever possible, all (relational) operators exploit a small set of properties maintained under updates and intermediates creation.

seq       - the sequence base, a mapping from array index 0 into a OID value
key       - the values in the column are unique
nil       - there is at least one NIL value
nonil     - it is unknown if there NIL values
dense     - the numeric values in the column form a dense sequence
sorted    - the column contains a sorted list for ordered domains
revsorted - the column contains a reversed sorted list

Execution model

Execution model mk Wed, 02/26/2014 - 13:05

The MonetDB kernel is an abstract machine, programmed in the MonetDB Assemblee Language (MAL). Each relational algebra operator corresponds to a MAL instruction.  Each BAT algebra operator maps to a simple MAL instruction, which has zero degrees of freedom; it does not take complex expressions as parameter. Rather, complex expressions are broken down into a sequence of BAT algebra operators that each perform a simple operation on an entire column of values (``bulk processing''). This allows the implementation of the BAT algebra to forsake an expression interpreting engine; rather all BAT algebra operations in the implementation map onto simple array operations.
For instance, the BAT algebra expression
    R:bat[:oid]:=select(B:bat[:int], V:int);
can be implemented at the C code level like:
     for (i = j = 0; i < n; i++)
       if (B.tail[i] == V)
         R.tail[j++] = i;

The BAT algebra operators have the advantage that tight for-loops create high data access and instruction locality. Such simple loops are amenable to compiler optimization (loop pipelining, vectorization, blocking, strength reduction), and CPU out-of-order speculative execution.

Software stack

Software stack mk Wed, 02/26/2014 - 13:07

MonetDB's query processing scheme is centered around three software layers. The top is formed by the query language parser and a heuristic, language - and data model - specific optimizer to reduce the amount of data produced by intermediates and to exploit catalogue knowledge, such as foreign key constraints . The output is a logical plan expressed in MAL.

The second tier consists of a collection of optimizer modules, which are assembled into an optimization pipeline. Each optimizer takes a MAL plan and transforms it into a more efficient one, optionally sprinkled with resource management directives and flow of control directives.  The modules provide facilities ranging from symbolic processing  up to just-in-time data distribution and execution.  The approach breaks with the hitherto omnipresent cost-based optimizers by recognition that not all decisions can be cast together in a single cost formula. Operating on the common binary-relational back-end algebra, these optimizer modules are shared by all front-end data models and query languages.

The third tier, the MAL interpreter, contains the library of highly optimized implementation of the binary relational algebra operators. They maintain properties over the object accessed to gear the selection of subsequent algorithms. For example, the Select operator can benefit both from sorted-ness of the BAT or it may call for a sample to derive the expected sizes. The interface is described in the MAL module sections.