One of the most frequent questions we get is "Why can't I use multiple threads to update a table in MonetDB?" The reason is that the transaction scheme MonetDB uses is Optimistic Concurrency Control (OCC for short). This article aims to provide a high level explanation of what OCC is, the trade-offs involved, and the design goals that finally led the MonetDB developers to implement OCC instead of some other kind of transaction management.
Before we continue let us make explicit the rules of the game: A client starts a transaction that either reads from, or writes to one single table in the database but not both. This model might seem restrictive, but allows for the concepts to be explained more clearly and can be easily generalized.
The problem addressed by transaction schemes in Relational Database Management Systems (RDBMSs for short) is the preservation of the database consistency in the presence of multiple concurrent clients. See the wikipedia lemma on ACID for more information about database consistency. Specifically when two different clients are trying to write to the same table there needs to be some kind of coordination if we are to keep our data consistent. See this document for an introductory exposition of the problems that might arise in practice.
Traditionally, transaction management in RDMBSs is implemented using locks, which allow only one user at any given time to modify a database object. A common scheme is to have two different locks: an exclusive write lock and a shared read lock. If a client needs to change a table, it acquires the exclusive lock and no other client can interact with that table until the lock is released. When a client needs to read from a table it acquires the shared lock. This allows other clients to acquire the shared lock at the same time and read from the table, but prevents clients from acquiring the exclusive lock and therefore prevents any change to the table. This scheme allows at any one time just one client with write access to the table, or multiple clients with read access, but not both.
What is Optimistic Concurrency Control?
Optimistic Concurrency Control (OCC) is a transaction scheme that guarantees the ACID properties without using locks. Conceptually (the actual implementation of the MonetDB transaction manager will not concern us in this article) each transaction works on its own copy of the database. When a transaction tries to commit, the transaction manager checks if there are conflicts, i.e. if another transaction has changed the table since the current transaction started. If no conflicts are detected, then the transaction is committed, otherwise it fails and needs to be rolled back.
Notice that under this scheme any number of concurrent read only transactions will succeed. On the other hand only one write transaction will succeed (the first one to commit) and all the other concurrent write transactions that try to commit after the successful write transaction will fail. Read transactions will not fail, but will present an out of date view of the table. This is the reason you get the error message that your write transaction failed and it needs to be rolled back: Some other concurrent transaction changed something and reached the commit point before the failed one.
Two important points need to be made here about MonetDB's OCC implementation:
- Concurrent updates on different tables are allowed under this scheme. There is no conflict if two concurrent transactions change different tables in the database.
- Data Definition Language commands (
ALTER), always conflict with concurrent write transactions. The reason for this has to do with how the SQL catalog is implemented in MonetDB.
Other causes for transaction failures
The transaction manager of MonetDB also requires a rollback of a transaction if any error happened during its execution. For example, if you try to insert a tuple with a primary key already in the table, the primary key constraint is violated and the transaction needs to be rolled back.
sql>create table lala (i int primary key); operation successful sql>insert into lala values (1), (2); 2 affected rows sql>select * from lala; +------+ | i | +======+ | 1 | | 2 | +------+ 2 tuples sql>start transaction; operation successful sql>insert into lala values (1); INSERT INTO: PRIMARY KEY constraint 'lala.lala_i_pkey' violated sql>insert into lala values (3); Current transaction is aborted (please ROLLBACK) sql>commit; COMMIT: transaction is aborted, will ROLLBACK instead
To illustrate with some other commonly used RDBMSs, PostgreSQL works the same way:
postgres=# create table lala (i int primary key); CREATE TABLE postgres=# insert into lala values (1), (2); INSERT 0 2 postgres=# select * from lala; i --- 1 2 (2 rows) postgres=# begin transaction; BEGIN postgres=# insert into lala values (1); ERROR: duplicate key value violates unique constraint "lala_pkey" DETAIL: Key (i)=(1) already exists. postgres=# insert into lala values (3); ERROR: current transaction is aborted, commands ignored until end of transaction block postgres=# commit; ROLLBACK
MariaDB on the other hand is not bothered by errors inside a transaction:
MariaDB [foo]> create table foo (i int primary key); Query OK, 0 rows affected (0.05 sec) MariaDB [foo]> insert into foo values (1), (2); Query OK, 2 rows affected (0.01 sec) Records: 2 Duplicates: 0 Warnings: 0 MariaDB [foo]> start transaction; Query OK, 0 rows affected (0.00 sec) MariaDB [foo]> insert into foo values (1); ERROR 1062 (23000): Duplicate entry '1' for key 'PRIMARY' MariaDB [foo]> insert into foo values (3); Query OK, 1 row affected (0.00 sec) MariaDB [foo]> commit; Query OK, 0 rows affected (0.00 sec) MariaDB [foo]> select * from foo; +---+ | i | +---+ | 1 | | 2 | | 3 | +---+ 3 rows in set (0.00 sec)
Why does MonetDB use OCC?
MonetDB is optimized for On-Line Analytical Processing (OLAP) workloads. This means a lot of expensive read only queries. The typical use of MonetDB is that the user spends some time loading a large amount of data in a small number of transactions (ideally one) and then uses the database to run a lot of big read only queries (big joins, aggregations, etc).
Given this usage profile it makes sense to avoid as many locks as possible. While acquiring a lock is not very expensive by itself, the cost quickly adds up if you run a large number of queries. MonetDB trades the throughput of write transactions, with reduced cost of read only transactions.
Under OCC it is clear that workloads with concurrent updates might suffer. Ideally the user should load the data into MonetDB in a small number of big transactions that run isolated.
If this is not possible due to the constraints of the application, care should be taken to run transactions in a retry wrapper, i.e. in code that checks for transaction failures and retries a transaction if it failed.
To partially cope with write heavy On-Line Transaction Processin (OLTP) workloads in MonetDB, one might consider using the oltp_pipeline optimizer available in the distribution. This optimizer analyses each SQL query for update statements against the persistent tables. It then injects advisory exclusive column locks to a lock manager, which effectively serializes the updates. These locks are held for a limited time, and after this time has elapsed the update query continues under the OCC scheme.
Please note that the oltp_pipeline optimizer is experimental and has not been tested extensively, so it might contain bugs that we don't know about. Also keep in mind that it will almost have an impact on performance. We encourage users with hybrid OLTP/OLAP workloads to provide us with feedback on this new feature.