The transaction management scheme deployed in MonetDB sometimes creates confusion with application developers. In particular how are transactions committed, persisted and isolated to guarantee the ACID properties of SQL. Most of the confusion stems from the difference between OnLine-Analytical-Processing (OLAP) and OnLine-Transaction-Processing (OLTP). MonetDB is primarily an analytical (OLAP) database, where most operations are SELECT-FROM-WHERE-GROUPBY queries. With that in mind, its transaction management scheme is tuned for reading large chunks of data, rather than writing small chunks of data at high speed concurrently. Nevertheless, MonetDB SQL obeys the transaction semantics of the SQL standard.
The MonetDB transaction management is based on optimistic concurrency control (OCC). OCC is particularly useful for read-heavy applications, with the design that transaction management overhead should only be paid when necessary. While OCC does provide each transaction with a consistent view on the database, updates are collected in an addendum. The addendum is then processed on transaction commit. If at commit time the tables affected by the transaction have not changed, the results are merged. Otherwise the transaction is aborted and the pending updates discarded. As a result, OCC can negatively impact long running transactions, as these can be affected by concurrent updates on their underlying tables. The same holds for applications that try to perform concurrent updates to the same tables from multiple application threads or to create/modify the database schema concurrently. To avoid such unexpected transaction aborts, the transactions should be internally serialized by the application developer.
The MonetDB SQL layer runs in auto-commit mode by default. This means that the database will commit transactions as
frequently as possible, usually after each and every SQL statement or catalog change. Auto-commit can be disabled by
setting the session variable auto_commit to false. By disabling auto-commit, one must manually mark the beginning of
a transaction with START TRANSACTION and terminate it with COMMIT. Alternative, calling ROLLBACK will revert all
operations since the beginning of the transaction. This mode is useful for bulk-loading large datasets with simple
insert statements. When a transaction is committed, its changes are written out to the MonetDB transaction log or
Write-Ahead Log (WAL). Actually the commit will not succeed until the transaction is fully persisted in the WAL.
Committed transactions are also immediately applied to the logical store, thus visible to all following transactions
and all clients querying the database. At certain intervals, if the database is idle, the store manager will merge the
updates from the WAL into the primary storage - the Binary Association Tables (BATs). By default, persisted transactions
are cleaned up from the WAL. During database startup, all transaction still left in the WAL are read and any that are
not yet applied are written into the persistent BAT storage. Hence, in case of failure, all transactions that were reported
as committed will indeed be present, once the database is made available to the user. The WAL is written to disk, where
every transaction is stored in a separate file. By default there is only one transaction file in the WAL directory - the current, not yet committed one. The WAL directory is by default found under <database directory="">/sql_logs
.
Inside it is a directory sql for the SQL module transactions. Different modules/top-level interfaces can have different
WAL directories. The inner directory contains the following files: log - the WAL catalog files The first line of the file
is the internal MonetDB kernel WAL version The next line is empty The third line is the latest transaction id, stored as
a Long Long number. log. - each of these files stores a single MonetDB transaction Part of the file
content is binary It is highly advisable never to manually edit the MonetDB WAL files.
There is ongoing work on a transaction replication scheme for MonetDB. With transaction replication, replica instances of MonetDB can be configured, ensuring higher availability of the system and improved level of concurrency. More information about the architecture of transaction replication in MonetDB can be found in the paper “Single-click to data insights: transaction replication and deployment automation made simple for the cloud age”. We plan to have the feature ready in the feature release following Jul2015.