Loading using SQL

Every SQL-based application understands INSERT INTO. For moderate amounts of data its performance is sufficient as long as you take care to avoid some pitfalls. In a nutshell,

  1. Make sure to disable autocommit mode or wrap the INSERT statements in a transaction

  2. When inserting programmatically, see if you can use prepared statements.

  3. Drop all constraints on the table and restore them after the INSERTs.

  4. Set SET sys.optimizer = 'minimal_pipe'

INSERT statements

The easiest and most portable way to insert data into a table is to use the INSERT INTO SQL statement.

INSERT INTO my_table VALUES (1, 2, 3);

If table my_table has three columns, this statement inserts a new row containing the given values. If the table has more columns, the additional columns are set to NULL or a default value.

It is possible to insert multiple rows using a single statement:

INSERT INTO my_table VALUES (1, 2, 3), (4, 5, 6);

This adds two rows. It is not recommended to have more than at most a few dozen rows per INSERT statement as the parsing overhead will quickly become a bottleneck.

Run all INSERTs in a single transaction

By default, MonetDB runs in autocommit mode which means that every statement executes in a separate transaction. This adds considerable overhead if the amount of work per statement is very small, such as with an INSERT statement. Wrapping the sequence of inserts between a START TRANSACTION statement and a COMMIT makes it a single transaction so the overhead is spread out over all the inserts.

As an alternative to START TRANSACTION you can also disable autocommit mode at the connection level. With mclient, this is done using the -a flag. When using JDBC (Java), use connection.setAutoCommit(false);. With ODBC, use the function SQLSetConnectAttr(). With pymonetd autocommit is already disabled by default.

Use prepared statements

If your language bindings support it you can use prepared statements to reduce the per-insert parsing and verification overhead. A prepared statement allows you to provide the INSERT statement once, with question marks ? used as placeholders for the values to insert. After preparing, you then repeatedly bind the placeholders to new values and execute the prepared statement.

Some language bindings provide special support for prepared statements. For example, in Java:

PreparedStatement stmt = connection.prepareStatement("INSERT INTO table VALUES (?, ?, ?)");
// repeatedly:
stmt.setInt(1, /* a value */);
stmt.setInt(2, /* a value */);
stmt.setInt(3, /* a value */);
// finally:

With ODBC, using prepared statements involves a call SQLPrepare(hdl, "INSERT INTO table VALUES (?, ?, ?)", SQL_NTS) followed by repeated calls to, SQLBindParameter() and SQLExecute().

Temporarily remove constraints

Table constraints such as PRIMARY KEY, UNIQUE and FOREIGN KEY are verified on every INSERT. This becomes increasingly time consuming.

Because checking constraints in bulk is much more efficient than repeatedly checking them for a single value, it is often more efficient to simply drop the constraints before the insertions using ALTER TABLE ... DROP CONSTRAINT, then recreate them after using ALTER TABLE ... ADD CONSTRAINT.

Disable the SQL optimizer

Before execution, every SQL statement is translated into an internal language called MAL. After translation and before execution the system analyzes the MAL plan and tries to make improvements to it. This is very effective for SELECT queries, especially complex ones, but there is not much to be optimized for INSERT plans.

If you're executing many INSERTs, disabling the MAL optimizer can make a large difference. To do so, execute

SET sys.optimizer = 'minimal_pipe';

This setting applies to the current connection only, so after completing the INSERTs, close the connection or execute

SET sys.optimizer = 'default_pipe';

Performance comparison

A small unscientific benchmark loading a table with 45529 rows and 109 columns, mostly varchar and decimal:

INSERT without transaction413.00s0.10×
INSERT default_pipe43.16s1.00×
PREPARE INSERT default_pipe24.35s1.77×
INSERT minimal_pipe22.27s1.94×
PREPARE INSERT minimal_pipe14.86s2.91×
COPY INTO0.73s59.4×

The final row is for comparison with the next data loading method we'll describe, loading CSV data using COPY INTO.