Skip to main content

MonetDB goes headless

In the beginning of MonetDB (in the 90's) there were BATs, Binary Association Tables, which is considered the minimal relationship to be maintained in an RDBMS [1]. A BAT consisted of BUNs, Binary UNits, which in turn consisted of a HEAD and a TAIL value. All head and tail values of a BAT were of the same type, but the head and tail types were independent of each other. There were (and still are) a whole bunch of built-in types from which to choose, and types could (and still can) be added in extension code. The head and tail values were stored together so that a BAT was basically a C array of structs of two elements each. This was also the format in which the data was stored on disk.

The thing with a C struct that contains, say, a one byte-sized value and an eight byte-sized value is that the longer value must be aligned to an eight byte boundary, and thus there are seven wasted bytes per struct (and hence, BUN). For this reason, years ago already, we changed the format in which BATs are laid out. Instead of a single array of structs, we then implemented what amounts to a struct of arrays. Since then the head values and tail values where each stored in a separate array, but those arrays had to be the same length (that is to say, number of elements). Using this format there is no wasted space between values. Byte-sized values don't have to be aligned on eight byte boundaries but can be stored with no intervening space. This also became the format in which the data was stored on disk. There were two files per BAT, one for the head column and one for the tail column.

Since the beginning, one of the types in MonetDB was the OID, the Object IDentifier. An OID is basically an integer, but with a special purpose. A special version of this type that was (and still is) used in BATs as either (or both) the head or the tail column is one in which each value is exactly one larger than the one before. This is something that happens a lot in MonetDB, and so, since the beginning, there is a special way of storing such values. Instead of allocating memory to store a sequence of numbers, we only store the first value, called the seqbase (sequence base). There is a special type to indicate this, named VOID, Virtual OID. Columns of type VOID don't need to be stored and in fact aren't stored. The only information that needs to be stored is the start value, and the number of elements.

It turned out that the SQL implementation in MonetDB only really needed head columns of type VOID. Columns in an SQL table are of course all the same size (number of elements) and of disparate types. The way these columns are stored is by using one BAT per column with the data in the tail of the BAT, and a VOID head column. The values in a row of an SQL table all have the same (virtual) head value to enable their reconstruction.

Five years ago, we decided that we would move toward a fully single column format. A BAT was to be degraded to a single (tail) column, producing a "pure" column store layout. The head column would thenceforth always be VOID and not stored. At the time there was still a lot of code that worked with the old two-columns-per-BAT format, so we slowly worked on transforming the code to work on VOID-headed BATs and only produce VOID-headed BATs as results. 

Today, as of the Dec2016 release, this work is finished.

There is no "head" column anymore. The only part of the old head column that still exists today is the head seqbase, the first value of the virtual OID sequence that was the head column.

In order to get to this stage, a number of things had to be changed since the Jun2016 release. Here is a summary.

The function BATseqbase has been replaced by a set of two new functions: BAThseqbase and BATtseqbase. BATseqbase was used to set the seqbase of a VOID head column, but also, when used in combination with BATmirror, to set the seqbase of a VOID tail column. The new function BAThseqbase sets the seqbase for the new column-less head, and BATtseqbase sets the seqbase of a VOID tail column.

The function BATkey used to set or clear a flag on the head column to indicate that all values in the column were distinct. Since the VOID head column has distinct values by definition, this function was only useful to set the flag on the tail column. This was done using BATmirror. Now the function works directly on the tail column.

The function BATmirror is gone. BATmirror was used to swap the head and tail columns of a BAT, but since there is no head column anymore, this function does not make any sense anymore. The few places where this function was used internally were things like setting a seqbase on a VOID tail column with BATseqbase (we now have BATtseqbase for that) and to indicate that all values in a column were unique by calling BATkey (which was changed to work on the tail column).

Functions and macros to retrieve properties of the head column (BAThordered, BAThrevordered, BAThdense, BAThvoid, BAThkey, BAThtype) are not needed anymore since we know the result (true, true if zero or one element, true, true, true, and void, respectively). These functions and macros are now gone.

The function BATnew was used to create a new BAT. Its parameters were the type of the head column (had to be TYPE_void), the type of the tail column, the expected size, and the role (PERSISTENT or TRANSIENT). Since the first argument had to be TYPE_void, it did not make sense specifying it. This function was replace by COLnew with a different first parameter. It takes the (initial) seqbase of the head column sequence. This also means that most of the time there is no need to call BAThseqbase to set this seqbase after creation.

Functions and macros to access values in the head column are now also not needed anymore, and hence have been removed. They are BUNhead, BUNhloc, BUNhvar, BUNhpos, Hloc, and Hpos.

It used to be that the head and tail columns could be given names. In practice, only the head column was ever given a name which was used as the name of the BAT. Now it is only possible to name a BAT, i.e., there is only a single name available per BAT.

[1] Kersten, M.L, Plomp, S, & van den Berg, C.A. (1992). Object Storage Management in Goblin. In Proceedings of International Workshop on Distributed Object Management 1992 (pp. 100–116). Morgan Kaufmann.