The MonetDB/e Python interface for data analytics

python-logo

Harnessing the power of an analytical database server into an embeddable library is a major step on the road to getting the optimal benefit from its performance advantages. MonetDB/e is at the core of MonetDB, which is written in C. The MonetDB system provides APIs to Python, Java, Ruby, R, PhP, … They all rely on the JDBC/ODBC layer for interaction with the server. Such a bridge within an embedded kernel and your application code is evidently not the way to go.

Therefore, our next target was to make the MonetDB/e functionality available as a simple drop-in library in Python 3, which has a clearly defined database interface similar to SQLite. With a claimed installed base of SQLite of more than one billion, there is a lot of code out there and users are accustomed to its programmatic interface. With this in mind, our goal was to follow the Python/SQLite 3 interface as much as possible.

We followed a test-driven approach by starting with the test suite for Python/SQLite 3 and working our way through all the cases covered. Soon we were able to cover over 80% of the relevant unit tests. Those remaining are assigned to the milestones laid out in the project issues roadmap.

What are the perks?

Since the package effectively encapsulates all of the MonetDB server code, it immediately provides the following features

  • SQL standard compliant (compatible with Postgresql)
  • SQL triggers and persistent stored modules
  • Full transaction management support, hot snapshots and checkpoints
  • Optimised and parallel query processing
  • Hassle-free index creation and maintenance
  • Extensive data types and operators, e.g. boolean, temporals, DECIMAL, JSON, UUID
  • Windowing functions, grouping sets, rollup, cube operations
  • Integration with Pandas, Numpy, and user-defined functions.

The Python library is extensive and is also focussed on coping with the limitations of SQLite 3. For example, converter and adapter functions in SQLite3 are not (yet) supported. They coped with the limited set of data types supported in SQLite in a Python context. MonetDB has a much richer built-in type system.

Likewise, transaction management in MonetDB/e is based on MVCC without explicit levels of isolations. You can completely control the transaction state by explicitly issuing START TRANSACTION, ROLLBACK, SAVEPOINT and COMMIT statements in your code.

Where is the database?

One of the key factors in an embedded database system is the location and capacity of the persistent data store if it needs to persistent at all. The easiest way is to start with :memory:, which creates an in-memory database limited by the amount of RAM you reserve for it. The next step is to use a directory on your local disk drives to create a persistent store. By changing a single line, i.e. the connect() call, you can even switch to a MonetDB server running in the background. In MonetDB/e we opted for a simple connection interface based on URIs, such as:memory:, /OSpath/directory?withoptions, and monetdb://mapi_host:port/db?withoptions.

How does it work?

Using the library starts with simply getting the module into your Python environment, e.g. pip install monetdbe. A minimal example to see if everything works as expected:

import monetdbe
 conn = monetdbe.connect(':memory:')
 c = conn.cursor()
 c.execute('SELECT count(*) FROM tables')
 print(c.fetchall())

Trying out a new system calls for examples and a migration path. We have collected a small collection of examples, e.g. helloworld.py and imdb.py, in the monetdbe-examples repository for inspiration. A synopsis of the discrepancies you may encounter in the conversion of your existing applications to run with MonetDB/e is documented in the migrations section.

And how about debugging and stability?

Using an embedded database library also comes with mutual responsibility. Unlike a server solution, your database could become corrupted with peek/poke in the underlying database structures. If you stick to the official API, this should not happen. Debugging the Python layer is straightforward. We do not, however, recommend going deeper into the MonetDB/e C-based kernel as you might easily get lost. This is better pursued in cooperation with us.

What are the other caveats?

There are a few more caveats to the approach presented. SQLite and MonetDB/e do not align 100% on the same interpretation of the SQL standard. MonetDB/e is more strict. This may result in minor differences in error handling or even some surprises in the results because SQLite is less strict in its interpretation, data precision and null handling in aggregates. Furthermore, unlike SQLite, the table columns are rigidly typed. Foreign key references are enforced upon transaction commit.

The Python package monetdbe is available from PyPi for Linux, Windows and Mac OS X. You don’t need to install or compile the MonetDB code itself. And of course, it is free, MPL2.0 licensed software. Users are more than welcome to try it out. Please drop us a line about your experiences - we’re really keen to hear what you think - and pass on this message to those who can benefit from the new kid on the block.