Roadmap 2025

 

MonetDB Releases

Next to the usual work on new (SQL) features and (performance) improvements of existing features, 2025 is to be a year of major changes in the MonetDB kernel. Therefore, we have planned two major releases for this year, so that we have a stable major version before releasing the kernel changes.

Major releases

  • External data loaders

We’re gradually adding more loaders to read in external data as a table on-the-fly, e.g. in a SELECT query. In 2025, we’ll release the first two schemes: “monetdb:” to connect to a remote MonetDB server and “odbc:” to connect to any ODBC data source.

We’ll also release better implementation for COPY INTO, in particular, BINARY and ON CLIENT data loading.

  • SQL features

Many convenient and popular SQL features will be released. The major ones include support for recursive Common Table Expression, which makes querying hierarchical data easier.

The ALTER TABLE statement will be extended with ALTER COLUMN to allow changing column types in-place with a single statement.

INSERT/UPDATE/DELETE will be extended with a RETURNING clause, which causes the modified records to be returned, which can then be directly queried using SELECT-like expressions in the RETURNING clause. This is a non-standard but very common SQL extension.

In addition, the interface of the query plan examining tools PLAN and EXPLAIN will be revamped. There will be only one EXPLAIN keyword supporting more options, so that the result of more intermediate steps during the query plan generation process can be examined.

  • Performance improvements

Performance of some window functions, ordered aggregate functions and string functions will be improved. In particular, we’re implementing a new bigram based algorithm to speed up the CONTAINS filter function for strings.

Flushing of the SQL logs will be faster, because we’re adding a bitmap to indicate the to-be-flushed files, instead of flushing all files.

We’re greatly reducing the number of times the full query parse tree is visited during query compilation, optimization and plan generation. Together with some other small improvements, MonetDB will be able to handle some extremely large queries, e.g. some BI applications generate queries containing tens of thousands of AND and OR expressions.

  • Memory allocator framework

This has been a multi-year project and we’ll finally be able to release it later in 2025. The new framework spins off multiple allocators, one for each main component of a MonetDB server, e.g. the storage layer and the execution layer. An allocator preallocates a big chunk of memory and hands out unused pieces from this chunk to answer malloc calls. Double linked lists are used to keep track of memory allocations and frees, and to merge freed pieces occasionally to avoid fragmentation.

Minor releases

Before the first major release this year, we’ll probably have one bugfix release for Aug2024. Then, for the two major releases this year, we might have two or three bugfix releases in total for them.

Multi-year Development

  • Parallel pipeline

We will continue the development of the parallel pipeline engine in 2025.

In the past year, much work has been done on the general execution frame work of the engine and pipelined version of many major relational operators, such as top N and GROUP BY. In 2025, we’ll focus on implementing the pipelined version of various JOIN operations.

  • Nested data type

We’re implementing a native support for nested data (that is stored as JSON files). In stead of storing such data as simple strings, which can be inefficient for both storage and query processing, we decompose nested data into columns of corresponding types. We use several additional (internal) columns to denote the values’ original position so that we can stitch the records or rows together when needed.

With this native support, both storage and processing of nested data can automatically benefit from data compression and query optimisation algorithms in MonetDB. In addition, the native support opens up future opportunities for specialised optimisation techniques. So this work is an important step toward support embedding vectors and graphs.

  • Embedding vector search

A first step toward efficient support of search queries on the embedding vectors is a well-designed storage scheme that can help accelerate the search. Therefore, we’re adding a new VECTOR data type specially for the embedding vectors. A hybrid storage schema, inspired by PDX, will be implemented, in which correlated dimensions are stored together to aid the nearest neighbour algorithms.

Wish List

The items on our wish list have been described in the roadmap of 2024.

  • Dynamic numerical columns

  • Encrypted column storage

  • Time series support

  • Graph support

In 2025, we’ll work on some aspects of these items, in particular a new storage scheme to accelerate embedding vector search and native JSON support, which are related to “dynamic numerical columns” and “graph support” respectively.