“An embedded database system is a database management system (DBMS) which is tightly integrated with an application that requires access to stored data, such that the database system is ‘hidden’ from the application’s end-user and requires little or no ongoing maintenance.” [wikipedia]
The embedded database market has for a long time been dominated by traditional row-stores, often with limited SQL functionality, scalability, or performance, and where a code footprint in the order of a few KBs is mandatory. Furthermore, most embedded database offerings are still proprietary. This market lacks a mature open-source SQL DBMS, something which has been deemed essential to push the performance envelope for data analytics applications.
What is the MonetDB solution?
Wouldn’t it be great if a full-fledged analytics SQL database server could be harnessed to the point at which it can work as an embedded DBMS? Equipped with a fully parallel execution engine to exploit the common multi-core platforms, it would address the need for data analytics and machine learning without relying on expensive interprocess communication or ETL. It would be strongly embedded in the application program to easily use the UDFs at close range without overhead in data exchanges. All this based on international award-winning technology and enterprise-grade hardened code. Look no further: MonetDB/e has arrived.
How we got there?
The solution brought to you can be seen above. On the left, we see MonetDB in the role of a client/server architecture as it has worked all along. The interaction between the applications and the server is regulated using standard APIs, such as JDBC/ODBC, and the MonetDB API.
In this architecture, the underlying assumptions are:
- The server is fully isolated from the application code for security, stability, reliability
- The server serves multiple users concurrently
- The server manages a large shared storage space
- The server can use replication for resilience and load balancing
- The server can communicate with its peers to execute queries over partitioned tables.
What’s new is that a large part of the server code has been decoupled to arrive at an embeddable SQL database system, called MonetDB/e. This is shown on the right. A single library to be linked into the application code directly suffices. The benefits are obvious:
- No client-server communication overhead
- No result-set serialisation, but binary columnar access
- Full use of the multi-core parallel query execution
- Seamless integration with the programming language, such as Dataframes and NumPy
- Single user control over the resources
- Work with :memory: databases with controlled RAM footprint
- Hybrid set up with concurrent :memory: and server-based storage
- Boost your data analytics programs with stateful User-Defined Functions
- Statistics, windowing functions, grouped sets, cube, and rollup
How does it work?
MonetDB/e is a serverless solution. It encapsulates the full SQL functionality offered by MonetDB, including a rich collection of data types, triggers, schema management, and persistent stored modules. To achieve this, we created a single shared library (libmonetdbe.so) to expose the main SQL interfaces using a small C-api (monetdbe.h). The proof of the pudding is in the eating, which meant that we also made the MonetDB server version fully dependent on libmonetdb.so! This avoids the need to maintain different code paths for the server and embedded solutions. The time has now come to switch to the new product and use it in areas hitherto seemingly a barren playing field.
What should you expect?
The MonetDB/e library supports three different storage schemes. A lightweight scheme for small databases confined to main memory, i.e. monetdbe_open(‘:memory:’). The maximum footprint can be set explicitly or is derived from the host (VM) resource limits. Using a directory on your local machine is the way to go if you need persistence, but with exclusive access as it concerns transactional properties. Last but not least, the API can also be used to interact with a MonetDB server, just by providing a URL with the credentials.
The pre-release of MonetDB/e is available for external use. To try it, check out the MonetDB default branch from its GitHub mirror. Build the system to create the library libmonetdbe.so. This can be linked to your C application. The embedded version is thread-safe. It can switch between persistent local stores at lightning speed. There is no such thing as replication or distributed queries over partitioned tables stored in different MonetDB/e :memory: instances. These features can be added upon request and if and when it becomes a recognised priority in the community take-up of the system.
Alternatively, check out the binary wheel monetdbe from PyPi for your Python application. They have been prepared for OSX and Linux, and Windows soon. In the next post, we will describe the Python language embedding in more detail.
Several rough edges and missing features will be dealt with before the MonetDB/e library becomes an official feature release later this summer, but users are more than welcome to try this pre-release. Please drop us a line about your experiences - your feedback is really important to what we do - or retweet this message to those that might need to know.
The MonetDB team
July 8, 2020