Performance Tips

There are many different ways to influence performance in MonetDB. Some of them are generic SQL features others are more specific to MonetDB. Here you will find an overview of features and tips that can be used to tweak performance.

Data Modelling tips:

  • Specify NOT NULL for columns which may not be NULL.

  • Define a primary key for each table and foreign keys in tables referencing primary keys.
    A primary key definition will implicitly create an internal Hash index.
    A foreign key definition will implicitly create an internal Join index and speed up joins between 2 tables when joined on their primary and foreign key columns.
    Try to keep the primary keys (and thus the foreign keys) as small as possible, preferably a single integer column. For compound keys consider to introduce a surrogate key.

  • Normalize columns which contain multiple values combined (often stored as a string in a certain format). For instance date values are sometimes stored as strings in a country specific format (e.g. January 11, 2006). It is better and more efficient to store each of the values in a separate column with a proper (numeric or date/time) data type. This also avoids doing the costly string extraction of the specific value part and data conversion in the queries again and again.

  • When a table is very large and your queries often need only parts of that table consider to horizontally partition the table into multiple partition tables and one merge table to logically recombine the partition tables. See: Data Partitioning, Updatable Merge Tables, CREATE MERGE TABLE

  • If your queries do some complex or expensive computations multiple times (or in multiple queries), it may be faster to first store the computation result in an added column of a table and next query the added column to eliminate the computation overhead in the queries again and again. Use: alter table statements to add an extra column and update statements to update it with the computed values. You may also add triggers to keep the extra column values updated when new rows are added or columns updated.

Runtime tips:

  • Run the ANALYZE statement to update the table's statistics after you have changed a table's data, see also Table Statistics

  • To view the relational execution plan of a specific query run it with the PLAN statement modifier explained on this page.

  • To view the MAL execution program of a specific query run it with the EXPLAIN statement modifier, see page.

  • To view the performance trace of the MAL statements of a specific query run it with the TRACE statement modifier, see page.

  • You may change the Optimizer pipeline to experiment with different optimizer pipeline steps.

  • Consider to use common table expression (CTE) via the WITH-clause in queries which do complex processing in FROM-clauses or WHERE-clauses or subqueries.

  • For loading large data sets use COPY INTO ... FROM file_name instead of INSERT INTO ... VALUES commands. See also table updates and Loading Bulk Data.

  • If you need to load large data sets via an API such as ODBC, JDBC, or DBI use prepared insert statement instead of regular insert statements. See the prepare and execute statement.
    It is important to execute SET sys.optimizer = 'minimal_pipe'; before loading the data via the (prepared) insert statements.
    Disable autocommit mode and use manual commit after many (> 40000 or all) rows are inserted into the table.
    If possible disable PRIMARY KEY, UNIQUE and FOREIGN KEY table constraints existing for the table by temporarily dropping them before the data load and adding these table constraints back again after the data load via the alter table drop/add constraint statements. See: Loading Bulk Data.