Recipes book

This section contains essays to simplify the use and deployment of SQL on the MonetDB platform. The topics are inspired by ongoing projects or specific user requests.

Client Interfaces

Graphical User Interfaces Graphical database utilities have been around for some time to support click-and-go interactions with a DBMS. They often support multiple database back-ends using a standard communication protocol, e.g. JDBC. MonetDB/JDBC makes such tools work flawlessly with the server and make your experience with MonetDB/SQL even better. Recommended GUIs are Squirrel and DBvisualizer.

The foundation for building your own applications rest on the programming language bindings provided.

MAPI Client

The mclient program is the universal command-line tool that implements the MAPI protocol for client-server interaction with MonetDB.

On a Windows platform it can be started using start->MonetDB->MonetDB SQL Client. Alternatively, you can use the command window to start mclient.exe. Be aware that your environment variables are properly set to find the libraries of interest.

On a Linux platform it provides readline functionality, which greatly improves user interaction. A history can be maintained to ease interaction over multiple sessions.

The default setting is geared at establishing a guest connection to an SQL database at a default server running on the localhost. The -h hostname specifies on which machine the MonetDB server is running. If you communicate with a MonetDB server on the same machine, it can be omitted.

     Usage: mclient [ options ]

     Options are:
      -h hostname | --host=hostname    host or UNIX domain socket to connect to
      -p portnr   | --port=portnr      port to connect to
      -u user     | --user=user        user id
      -d database | --database=database  database to connect to
      -e          | --echo             echo the query
      -E charset  | --encoding=charset specify encoding (character set) of the terminal
      -f kind     | --format=kind      specify output format {csv,tab,raw,sql,xml}
      -H          | --history          load/save cmdline history (default off)
      -i          | --interactive      read stdin after command line args
      -l language | --language=lang    {sql,mal}
      -L logfile  | --log=logfile      save client/server interaction
      -s stmt     | --statement=stmt   run single statement
      -X          | --Xdebug           trace mapi network interaction
      -| cmd      | --pager=cmd        for pagination
      -?          | --help             show this usage message

     SQL specific opions 
      -n nullstr  | --null=nullstr     change NULL representation for sql, csv and tab output modes
      -r nr       | --rows=nr          for pagination
      -w nr       | --width=nr         for pagination
      -D          | --dump             create an SQL dump
      -N          | --inserts          use INSERT INTO statements when dumping

The default mapi_port TCP port used is 50000. If this port happens to be in use on the server machine (which generally is only the case if you run two MonetDB servers on it), you will have to use the -p port do define the port to which the mserver is listening. Otherwise, it may also be omitted. If there is more than one mserver running, you must also specify the database name -d database. In this case, if your port is set to the wrong database, the connection will always be redirected to the correct one. Note that the default port (and other default options) can be set in the server configuration file.

Within the context of each query language there are more options. They can be shown using the command \? or using the commandline.

For SQL there are several knobs to tune for a better rendering of result tables (\w).

     shell> mclient -d database
     Welcome to mclient, the MonetDB/SQL interactive terminal (Apr2011-SP2)
     Database: MonetDB v11.5.2, 'database'
     Type \q to quit, \? for a list of available commands
     auto commit mode: on
     sql>\?
     \?      - show this message
     \<file  - read input from file
     \>file  - save response in file, or stdout if no file is given
     \|cmd   - pipe result to process, or stop when no command is given
     \h      - show the readline history
     \t      - toggle timer
     \D table- dumps the table, or the complete database if none given.
     \d[Stvsfn]+ [obj] - list database objects, or describe if obj given
     \A      - enable auto commit
     \a      - disable auto commit
     \e      - echo the query in sql formatting mode
     \f      - format using a built-in renderer {csv,tab,raw,sql,xml}
     \w#     - set maximal page width (-1=unlimited, 0=terminal width, >0=limit to num)
     \r#     - set maximum rows per page (-1=raw)
     \L file - save client/server interaction
     \X      - trace mclient code
     \q      - terminate session

SQuirreL

SQuirreL SQL Client is a graphical Java program to view the structure of a JDBC compliant database, browse the data in tables, issue SQL commands, etc. It is a very cool tool. It can even draw dependency graphs based on the schema and foreign keys. The latest versions is available from their website. Download and install the software. The following scheme works for version 3.4.

After starting SQuirreL for the first time, it will load its default welcome screen. Locate the Drivers window and click to add a driver.

SQuirreL_MonetDB_driver_dialog

In the pop up form enter the driver name MonetDB, an example URL pointing to the database of interest jdbc:monetdb://<host>[:<port>]/<database>, the website https://www.monetdb.org/Home. Now goto Extra Class Path and add the MonetDB JDBC driver jar file. The latest version can be downloaded from the MonetDB repository or  came with your distribution, typically installed in ${prefix}/share/monetdb/lib/monetdb-jdbc-X.Y.jar. Next, click on the List Drivers button and it will insert nl.cwi.monetdb.jdbc.MonetDriver to the Class Name field. Click OK.

Now create a new Alias to setup a connection. You will have to edit the alias to direct it to the proper database server (change URL) and use the proper user credentials. Explore the application and have fun!

DbVisualizer

DbVisualizer is a platform independent tool aimed to simplify database development and management for database administrators and developers. It's a very cool tool (can even draw dependency graphs based on the schema and foreign keys.

Free, personal use versions are available from their website. Download and install the software. The following scheme works for their versions 6.1.

After starting DbVisualizer for the first time, it will load its default welcome screen and starts the wizard to select database driver. Cancel this wizard and open the Driver Manager, using the menu Tools->Driver Manager... In the Driver Manager add a new Driver using Driver->Create Driver... Type the name of the driver in the Name field, e.g. MonetDB. Type the URL format for the driver: jdbc:monetdb://hostname/database. In the Driver File Paths box, follow the directions to load a JAR file that contains the JDBC driver. Use the MonetDB JDBC driver that came with your distribution, typically installed in ${prefix }/share/MonetDB/lib/monetdb-X.Y-jdbc.jar.

After adding, the window should list the driver class nl.cwi.monetdb.jdbc.MonetDriver and automatically fills it in in the Driver Class field. Close the Driver Manager window.

From the menu select Database->Create Database Connection. A pop-up dialog will try to pursue you to use the wizard. You know better so, click "No". Fill in the name for the connection in the Alias field, e.g. MonetDB. Select the monetdb driver from the list. Copy the default URL by clicking on the "URL Format: ..." text field and change it to reflect the right hostname (usually localhost will do). Fill in the default userid and password (monetdb). Press the "Connect" button. It will report the database being used and the JDBC driver in use. In the left pane the monetdb database now becomes available from browsing.

Explore the application and have fun!

A caveat of the free-version system is its performance on SQL scripts. They are sent as a single string to the server for execution. This is not the most optimal situation for MonetDB. Running a batch script is better started from the MonetDB SQL client.

JDBC Client

Note: the jdbcclient tool is only provided for performing basic JDBC tests.  It is not supported, mclient should be used instead.

The textual client using the JDBC protocol comes with several options to fine-tune the interaction with the database server. A synopsis of the calling arguments is given below

     java -jar ${prefix}/share/monetdb/lib/jdbcclient.jar  \
             [-h host[:port]] [-p port] \
     		[-f file] [-u user] [-l language] [-d [database]] \
     		[-D [table]] [-e] [-X<opt>]

or using long option equivalents –host –port –file –user –language –dump –echo –database. Arguments may be written directly after the option like -p50000.

If no host and port are given, localhost and 50000 are assumed. An .monetdb file may exist in the user's home directory. This file can contain preferences to use each time the program is started. Options given on the command line override the preferences file. The .monetdb file syntax is <option>=<value> where option is one of the options host, port, file, mode debug, or password. Note that the last one is perilous and therefore not available as command line option. If no input file is given using the -f flag, an interactive session is started on the terminal.

NOTE The JDBC protocol does not support the SQL DEBUG <query>, option. Use the mclient tool instead.

 

OPTIONS

-h --host
The hostname of the host that runs the MonetDB database. A port number can be supplied by use of a colon, i.e. -h somehost:12345.
-p --port
The port number to connect to.
-f --file
A file name to use either for reading or writing. The file will be used for writing when dump mode is used (-D –dump). In read mode, the file can also be an URL pointing to a plain text file that is optionally gzip compressed.
-u --user
The username to use when connecting to the database.
-d --database
Try to connect to the given database (only makes sense if connecting to monetdbd).
-l --language
Use the given language, defaults to 'sql'.
--help
This screen.
--version
Display driver version and exit.
-e --echo
Also outputs the contents of the input file, if any.
-q --quiet
Suppress printing the welcome header.
-D --dump
Dumps the given table(s), or the complete database if none given.

EXTRA OPTIONS

-Xdebug
Writes a transmission log to disk for debugging purposes. If a file name is given, it is used, otherwise a file called monet<timestamp>.log is created. A given file will never be overwritten; instead a unique variation of the file is used.
-Xhash
Use the given hash algorithm during challenge response. Supported algorithm names: SHA1, MD5, plain.
-Xoutput
The output mode when dumping. Default is sql, xml may be used for an experimental XML output.
-Xbatching
Indicates that a batch should be used instead of direct communication with the server for each statement. If a number is given, it is used as batch size. I.e. 8000 would execute the contents on the batch after each 8000 read rows. Batching can greatly speedup the process of restoring a database dump.

 

Loading Bulk Data

There are many ways in which you can insert data into an SQL table in MonetDB.  Here we will explain the pros and cons of various methods.

The easiest way to insert data is to use the INSERT INTO query:

INSERT INTO table VALUES (1, 2, 3);

The benefit is clear: this is very straightforward.  However, this is a seriously inefficient way of doing things in MonetDB.

MonetDB, by default, runs in auto-commit mode.  This means that each query is a database transaction.  This in turn means that each query that changes the database necessarily causes a write to the disk in order to safeguard the change.  This is a serious bottleneck.  As you can see on Wikipedia, a typical 7,200 rpm SATA drive can do about 75 to 100 I/O operations per second.  Since each INSERT is at least one I/O operation (depending on, among others, the underlying operating system), this limits throughput severely.

A much better way to do lots of inserts is to make sure the inserts are bundled into a single transaction.  This can easily be done by surrounding the inserts with a START TRANSACTIONand COMMIT, as in:

START TRANSACTION;
INSERT INTO table VALUES (1, 2, 3);
...
COMMIT;

An other way of accomplishing the same result is to clear the auto-commit mode in the server.  How this is done depends on how you connect to the server.  In ODBC this can be done with a call to the function SQLSetConnectAttr(), in JDBC it can be done with a call to the setAutoCommit()method.  In the mclienttool, it can be done when using it interactively with the \acommand, and from the Apr2012 release forward, by using the -acommand line option.

Note that the variant

INSERT INTO table VALUES (1, 2, 3), (3, 4, 5), (6, 7, 8);

is also supported by MonetDB.  However, this variant is not to be recommended for large numbers of tuples since it uses a lot of stack space on the server, and stack space is a very limited commodity.  In other words, if you try to insert too many tuples in one go, the query will fail, or worse, the server will crash.

A slightly faster way of inserting data is to use a prepared query. In ODBC you would call:

SQLPrepare(hdl, "INSERT INTO table VALUES (?, ?, ?)", SQL_NTS);

you will also need to call SQLBindParameter()for each of the parameters in the query, and then call

SQLExecute(hdl);

for each insert.  As before, all insert queries should be done in a single transaction.

In JDBC a similar sequence using a PreparedStatement can be used. Also consider using the JDBC batching functionality.

The methods discussed so far share another disadvantage: even if the server has multiple CPUs/cores, the server will only use a single one at a time because each insert is a separate query.  There is also a way to more fully use the CPU power of the system by using COPY INTOCOPY INTO is a single query that takes a complete file of comma-separated values (CSV) and inserts the data in one go.  The values don't actually have to be comma-separated, any delimiter will do.  Since COPY INTO is a single query, whether or not auto-commit mode is enabled doesn't matter.

COPY INTO comes in many variants.  The easiest form is:

COPY INTO table FROM 'file';

The 'file' must be an SQL string that contains the absolute path name of the file to be inserted.  The file is read directly by the server, and so has to be accessible to the server.  By default, the field delimiter is '|', i.e. a vertical bar, and the record delimiter is '\n', i.e. a newline.  The delimiters can be changed:

COPY INTO table from 'file' USING DELIMITERS '|','\n';

Often CSV-encoded files use quote characters to surround field values.  By default, no quote characters are recognized, but by using the following form, they will be recognized:

COPY INTO table from 'file' USING DELIMITERS '|','\n','"';

When using quote characters, the other delimiters can be part of the value, and the quote character itself can be part of the value if it is preceded by a \ (backslash) or if it is doubled.  The backslash itself must also be doubled.

Often NULL values need to be inserted as well.  By default, a NULL value is indicated by using NULL in the CSV file, but often the actual value used is different, e.g. no value.  This can be indicated by using:

COPY INTO table from 'file' USING DELIMITERS '|','\n','"' NULL AS '';

When the data is not directly accessible to the server but is accessible to the client, the 'file' in the above queries can be replaced by STDIN:

COPY INTO table FROM STDIN;

When using STDIN, the contents of the CSV file should follow the query.  This form is not directly usable in ODBC or JDBC, but see SQLcopyinto.java for an example of how to use this from Java.  If using the mclient tool and the CSV data is in a file by itself, you can use the command (also see the mclient manual):

mclient -d database  -s  "COPY  INTO  table  FROM  STDIN  USING  DELIMITERS ',','\\n','\"'" - < file

The COPY INTO forms mentioned so far share one important drawback.  When a lot of data is to be inserted, the server doesn't know how much memory to allocate for the tables, and so will likely allocate too little.  This means that during the insertion process, the server has to grow the allocated memory area.  This can be an expensive operation.  Therefore, it is better to give the server a count of how many records are to be inserted:

COPY n RECORDS INTO table FROM 'file';

Here n should be a number that is at least as large as the actual number of records to be inserted.  If the file contains more than n records, only n will be inserted, if the file contains fewer, all values will be inserted.  Giving a higher number is especially useful if multiple COPY INTO queries are to be done on the same table.  The first COPY INTO, when the table is still empty, should be the total count of to-be-inserted values so that the server will allocate enough memory when the tables are first created (they are only really created once data is inserted).

This form is the fastest way of inserting data into MonetDB.

One more elaboration of the COPY INTO form is to specify an offset into the CSV file.  Often CSV files contain header information that is not part of the actual data.  This header information can be skipped by using the form

COPY n OFFSET m RECORDS INTO table FROM 'file';

Here n is the number of records to insert and m is the first record to be inserted where the first record in the file is called 1 (i.e. counting is 1 based).
 

Binary Bulk Loads

A database is often populated with the SQL COPY statement using an ASCII representation of a table with column and record separators, e.g. a CSV file. This scheme is preferred for almost all situations. Its performance is mostly determined by parsing cost of turning ASCII value representation into the native binary one. The MonetDB copy process is highly optimized for multi-core systems, where several threads in parallel handle variable length records efficiently.

When large tables are migrated between MonetDB instances, or when large tables are produced by a trusted external program, a slightly faster loading track can be used instead. This saves rendering of data into ASCII and subsequent parsing of the data being exchanged. For this to work, the user creates the binary version of the BAT storage and 'attach' it to the SQL catalog using a variation of the COPY command.

To illustrate, consider the following snippet:

create table Tmp( i integer, f real, s string); 
copy binary into Tmp from ('path_to_file_i', 'path_to_file_f', 'path_to_file_s');

Each attachment file is produced by a program that writes the binary image of the BAT directly, i.e. a binary dump of an C-array. For the SQL types char (1 byte), tinyint (8-bits), smallint (16-bits),  int/integer (32 bits), and bigint (64 bits) correspond with the C types directly; real and double are mapped onto type float and double in C, respectively. For variable length strings, the file must have one C-based string value per line, terminated by a newline, and it is processed without escape character conversion. MonetDB assumes that all files are aligned, i.e. the i-th value in each file corresponds to the i-th record in the table.

All other types (including UTF8 and escaped characters) should be handled using the default COPY INTO language construct.

The file names must be absolute paths to the column input files. They must reside on the same file system as the database farm and they will replace the content of the table Tmp. The files with numeric data are moved into place to avoid copying. String input files are retained in their original location and should be removed by the user.

WARNING: currently, upon discovering an error in the input, the files are 'gone'.

Apache Pig

In many cases, the data that is supposed to be stored in MonetDB is created by a large-scale data aggregation process. Apache Pig is a popular tool for this task. There is a possibility of directly generating MonetDB binary column files using the MonetDB-Pig-Layer.

Inside a Pig script, this is used as follows:

STORE raw INTO './results/' USING nl.cwi.da.monetdb.loader.hadoop.MonetDBStoreFunc;
 
This will generate a SQL schema file, a set of column files, and a load command. Keep in mind that currently, only primitive Java types such as Integer, Long, Float, Double, String etc. are supported.

User Defined Functions

An open source solution provides a stepping stone for others to extend its kernel functionality with specific types and functions. Experience shows that the need for those are fairly limited. Often the use of the built-in data types, the MAL algebra and functional abstraction, provide the necessary toolkit to achieve your goal.

In the few cases where the MonetDB kernel and SQL runtime system needs extensions, it calls for access to the source code of MonetDB and proficiency in C-programming, compilation and debugging. The openess of MonetDB means that extensions are not sand-boxed; they run within the system address space. Moreover, the multi-layered architecture means you have to make the functions written in C known to the MAL interpreter, before they can be made known to the  SQL compiler. The current setup makes this a little more cumbersome, but the added benefit is that both simple scalar functions and columnar operations can be introduced.

In this section we show how to extend SQL with a simple scalar function to reverse a string, i.e.

sql> select 'hello',reverse('hello');
+---------------+
| hello | olleh |
+---------------+

step 1. You should access and be able to compile and install MonetDB in a private directory.

step 2. Go to the sql/backends/monet5/UDF directory from the sources top directory. It provides the reverse example as a template. A group of user-defined functions is assembled in a directory like UDF. It contains files that described the SQL signature, the MAL signature, and the C-code implementation.

step 3. Extension starts with a definitin of the MAL signatures. See the example given, or browse through the files in monetdb5/modules/mal/*.mal to get a glimpse on how to write them. The MonetDB kernel documentation provides more details.  The file contains the MAL snippet:
command reverse(ra1:str):str
address UDFreverse
comment "Reverse a string";

step 4. The signature says that it expects a command body implementation under the name UDFreverse, shown below. The C-signature is a direct mapping, where arguments are passed by reference and the return value(s)  references are the first in the arguments list. The body should return a (malloced) string to denote an exception being raised or MAL_SUCCEED upon access.
#include "udf.h"

static str
reverse(const char *src)
{
    size_t len;
    str ret, new;

    /* The scalar function returns the new space */
    len = strlen(src);
    ret = new = GDKmalloc(len + 1);
    if (new == NULL)
        return NULL;
    new[len] = 0;
    while (len > 0)
        *new++ = src[--len];
    return ret;
}

str
UDFreverse(str *ret, str *src)
{
    if (*src == 0 || strcmp(*src, str_nil) == 0)
        *ret = GDKstrdup(str_nil);
    else
        *ret = reverse(*src);
    return MAL_SUCCEED;
}

step 5. The next step is to administer the routine in the SQL catalog. This calls for a SQL statement to be executed once for each database. The autoload method can relieve you from loading the modules manually in the server after each restart. The UDF template contains the file 80_udf.sql and 80_udf.mal. The former contains the definition needed for SQL:
create function reverse(src string)
returns string external name udf.reverse;

step 6. The MAL interpreter should be informed about the linked in functionality. This is faciliated using an autoload feature too. The MAL script  simply contains the module signature.

include udf;

step 7. After all pieces are prepared, you have to call the bootstrap program in the root of your checked out source tree once. Thereafter a configure/make/make install attempts compilation and places the interface files and libraries in the proper place.

Creation of bulk  and polymorphmic operations require much more care. In general, it is best to find an instruction that is already close to what you need. Clone it, expand it, compile it, and test it.  A bulk variation of the reverse operation is included in the sample UDF template. As a last resort you can contact us on the mailing lists for further advice.
 

Optimizer Recipes

Every SQL query passes a series of optimizer steps to arrive at a more efficient execution plan. The steps taken are denoted as a 'optimizer pipe' identified by a name, e.g. default_pipe. The active pipe is denoted by the SQL global variable called, e.g.

sql>select optimizer;
+--------------+
| single_value |
+==============+
| default_pipe |
+--------------+
1 tuple (0.863ms)

 

Query History

[TO BE REPLACED, It does not support concurrent use] The SQL implementation comes with a simple query profiler to detect expensive queries. It is centered around two predefined tables that stores information about all SQL queries definitions and their execution time.

The history table is controlled by the boolean system variable 'history'. After this variable is set, each  query compiled added to the cache is also entered into the 'queryHistory' table using a hardwired call to the procedure 'keepQuery'.

create table queryHistory(
    id wrd primary key,
    defined timestamp,  -- when entered into the cache
    name string,        -- database user name
    query string,       -- original text
    parse bigint,       -- time in usec
    optimize bigint     -- time in usec
);

The query performance is stored in the 'callHistory table. The key timing attributes are 'xtime', i.e. the time to prepare for the answer, and 'rtime', i.e. the time to render the result set  and sent it to the user. All timing in usec.

The remaining parameters illustrate the resource claims. The 'tuples' attribute denotes the size of the result set in number of rows. The 'foot'-print depicts the maximum amount of memory claimed to keep all relevant intermediates and persistent bats in virtual memory at any given time during query execution. The 'memory' parameter is total amount of BAT storage claimed during query execution. The 'inblock' and 'oublock' indicate the number of physical IOs during the execution.

create table sys.callHistory(
    id wrd references queryHistory(id), -- references query plan
    ctime timestamp,    -- time the first statement was executed
    arguments string,
    exec bigint,        -- time from the first statement until result export
    result bigint,      -- time to ship the result set to the client
    foot bigint,        -- footprint for all bats in the plan in bytes
    memory bigint,      -- storage size of intermediates created in bytes
    tuples wrd,         -- number of tuples in the result set
    inblock bigint,     -- number of physical blocks read
    oublock bigint      -- number of physical blocks written
);

create view queryLog as
select * from sys.queryHistory qd, sys.callHistory ql
where qd.id= ql.id;

The following code snippet illustrates its use.

sql>set history=true;
sql>select 1;
+--------------+
| single_value |
+==============+
|            1 |
+--------------+
1 tuple
sql>select 1;
+--------------+
| single_value |
+==============+
|            1 |
+--------------+
1 tuple
sql>select * from sys.queryHistory;
+---------+----------------------------+---------+-----------------------------+-------+----------+
| id      | defined                    | name    | query                       | parse | optimize |
+=========+============================+=========+=============================+=======+==========+
| 1010556 | 2010-05-15 22:07:29.000000 | monetdb | select 1;                   |  3473 |      224 |
| 1010655 | 2010-05-15 22:07:47.000000 | monetdb | select * from queryhistory; |   593 |     1090 |
+---------+----------------------------+---------+-----------------------------+-------+----------+
2 tuples
sql>select * from sys.callHistory;
+---------+----------------------------+---------------+-------+-------+--------+--------+--------+---------+---------+
| id      | ctime                      | arguments     | exec  | result| foot   | memory | tuples | inblock | oublock |
+=========+============================+===============+=======+=======+========+========+========+=========+=========+
| 1010556 | 2010-05-15 22:07:29.000000 | user.s2_1(1); |     0 |     6 |      0 |      0 |      1 |       0 |       0 |
| 1010556 | 2010-05-15 22:07:31.000000 | user.s2_1(1); |     1 |     6 |      0 |      0 |      1 |       0 |       0 |
| 1010655 | 2010-05-15 22:07:47.000000 | user.s3_1();  |   731 |    95 | 149456 | 149456 |      2 |       0 |       0 |
+---------+----------------------------+---------------+-------+-------+--------+--------+--------+---------+---------+
3 tuples
sql>select id, query, avg(exec) from sys.queryLog group by id,query;
+---------+-----------------------------+------------------------+
| id      | query                       | L4                     |
+=========+=============================+========================+
| 1010556 | select 1;                   |                    0.5 |
| 1010655 | select * from queryhistory; |                    731 |
| 1010744 | select * from callhistory;  |                   1100 |
+---------+-----------------------------+------------------------+
3 tuples
sql>call sys.resetHistory();

QueryHistory and callHistory are ordinary tables linked with a foreign-key constraint, which can be cleaned using the procedure resetHistory.

Optimizer Pipelines

The code produced by MonetDB/SQL is massaged by an optimizer pipeline.  The pipeline to be used is identified by the SQL global variable optimizer, which can be modified using a SQL assignment, e.g. the minimal optimizer pipeline is set using:

sql>set optimizer='minimal_pipe';
sql>select optimizer;
+--------------+
| single value |
+==============+
| minimal_pipe |

+--------------+

 

Each pipeline consists of a sequence of MAL function calls that inspect and transform the plan for better execution. The preferred optimizer can be changed by either providing its name or the complete MAL function sequence.  For example the minimal pipe can be set also using:

sql> set optimizer='optimizer.inline();optimizer.remap();optimizer.deadcode();optimizer.multiplex();optimizer.garbageCollector();';

The final result of the optimizer steps becomes visible using the SQL EXPLAIN statement modifier. Alternatively, the SQL DEBUG statement modifier in combination with the 'o' command provides access to the intermediate optimizer results.

Several optimizer pipelines are pre-defined in the MonetDB sources. They represent ongoing development activities and experimental code. Their scope of applicability and/or stability  has not reached a satisfactory level to include it in the default pipeline. The predefined optimizer pipelines can be inspected as follows: (as of Apr12 release)

sql>select * from optimizers();

/* TBD */

A  user-defined optimizer pipeline is checked against the dependency information maintained in the optimizer library to ensure there are no conflicts and at least the pre-requisite optimizers are used.

The default SQL optimization pipe line contains the following steps:

inline Inline functions identified as such.
remap Locate hardwired multiplex operations.
evaluate Evaluate constant expressions.
costModel Inspects the SQL catalog for size information.
coercions Performs static type coercions.
emptySet Removes empty set expressions
aliases Remove alias assignments.
mitosis Horizontal fragmentation.
mergetable Expand horizontal fragmented plans.
commonterms Remove duplicate expressions.
joinPath Searchs multiple joins and glues them together for better runtime optimization.
deadcode Remove all code not leading to used results.
reduce Reduces the stack space for faster calls.
garbageCollector Injects calls to the garbage collector to free up space.
dataflow item Prepare code for multi-core execution
multiplex Expand all remaining multiplex operations to iterators.

 

 

 

 

Cluster Management

Topic How to set up a cluster of three MonetDB servers belonging to the same group.

Building a distributed system using the MonetDB infrastructure requires a number of machines that run monetdbd (see its man-page for details). Assume three machines to be  as workers, called cruquius, lijndenand leeghwater. Provided MonetDB/SQL has been compiled and installed on each of the workers, perform the following steps on each worker to get monetdbd running:

worker% monetdbd create /fullpath/to/mydbfarm
worker% monetdbd set passphrase=mypassphrase /fullpath/to/mydbfarm
worker% monetdbd start /fullpath/to/mydbfarm

This creates the location for the database on each worker and sets the private control passphrase for the MonetDB daemon to "mypassphrase". The final step is to activate the daemon itself. This is basically all you need to initialise a cluster of MonetDB instances.

Assume you have likewise initialized MonetDB on your desktop,  named willem, from which you will control the MonetDB instances. On the desktop machine you can use the following command to show which machines guarded with monetdbd can be reached:

willem% monetdb discover

Amongst others it will show you the workers you just initialized. From our machine willemwe can further work with the cluster by addition of a database called pumpon each node as follows:

willem% monetdb -h cruquius -P mypassphrase create pump
willem% monetdb -h lijnden -P
mypassphrase create pump
willem% monetdb -h leeghwater -P
mypassphrase create pump

The discover command will not show them yet, as they are still in maintenance mode.

Beware, these three identically named databases are not related to each other in any way, i.e.  access control and database content can be different for each individual database pump. They are logical names to be refined using role tags to designate their function. Consider we use two server instances to act as harvey[1,2] and one as fox. How these functionalities behave is beyond the control of the MonetDB daemon. It suffices to setup the shared property tag to designate them all to belong to the same group.

willem% monetdb -h cruquius -P mypassphrase set shared=pump/2/harvey pump
willem% monetdb -h lijnden -P
mypassphrase set shared=pump/2/fox pump
willem% monetdb -h leeghwater -P
mypassphrase set shared=pump/1/harvey pump

Now that we have set them up, they can be made available for use:

willem% monetdb -h cruquius -P mypassphrase release pump
willem% monetdb -h lijnden -P
mypassphrase release pump
willem% monetdb -h leeghwater -P
mypassphrase release pump

An mclientconnection can now go to any of the hosts directly, or use a pattern to find the right machine. Given the example above with our tags, for instance the following two mclient calls connect to lijndenand round-robin to cruquiusand leeghwaterrespectively:

willem% mclient -d '*/pump/2'
willem% mclient -d '*/pump/*/harvey'

Data Vaults

Modern dataware houses are not confined to strict tabular data loaded into the database using the COPY INTO statement. In many areas, most notably in the scientific domains, proprietary file formats have been designed, augmented with libraries to create and inspect them. MonetDB recognizes the value of such choices and provides an easy way to open up these treasure chest for SQL querying. Therefor, it introduces a series of data vault, each geared towards supporting a well-defined foreign file format.

This topic is the focus in ongoing research [1].

Database Sampling

Sampling a database is a essential step to improve the response time for database exploration.  In the context of the SciBORQ project, we introduce a number of sampling techniques in the MonetDB software stack. Our goal is to provide methods for performing sampling (uniform and weighted) over a) the result of a query, b) the base tables, and c) the entire database schema. Sampling can be performed during query execution, as well as during data loading in the case of predefined sampling indexes. The sampling methods included in the distribution are described below.

Uniform Sampling.
A new SQL operator has been added to support sampling the result of a query.

sample_column:
    SELECT ... FROM ... WHERE ... [ UNIFORM ] SAMPLE <expr>

If <expr>  is an integer literal greater than 1, it defines the number of rows to be included in the sample. If <expr> is a real literal between [ 0.0, 1.0 ]  it refers to the percentage of the result set to be sampled. For example,  if <expr> is 0.3 then the sample will contain 30% of the rows in the query result.

SAMPLE is treated as a variation of  the LIMIT, ORDER BY, etc. clauses, which means it can only be in the outer most SELECT clause. SAMPLE cannot appear in a subquery, but this can be circumvented using a table producing function, for example

CREATE FUNCTION mysample ()
RETURNS TABLE(col a,...)
BEGIN
   RETURN
     SELECT a,...
     FROM name_table
     SAMPLE 100;
end;

and then use function mysample() to create a sampled table  E.g.,

INSERT INTO sample_table (SELECT * FROM mysample());

The implementation of the uniform sampling is based on the algorithm A as  described in the paper "Faster Methods for Random Sampling" by Jeffrey Scott Vitter [1]. Algorithm A is not the fastest one, but it only makes <expr> calls in  function random() and it is simpler than the other more complex and CPU intensive algorithms in the literature. Algorithm A instead of performing one random experiment for each row to decide if it should be included in the sample or not, it skips <expr> rows and includes the next row found. The algorithm scans the input relation sequentially and maintains the uniqueness and sort properties. The sample is  without replacement.
 

Database migration

The easiest method to move a MonetDB to a new location, or to create a copy for back-up purposes, runs as follows. First stop the database server by calling the monetdb management tool

bash> monetdb lock dbname

When the server is stopped, you can still connect as a database administrator to ensure that all (update) queries have ceased execution (see system monitor). Once satisfied that the activity has ceased, you can either follow the safe route to dump/restore a database, or perform a file-level RSYNC or RCP to copy the complete <dbfarm>/<dbname> directory to another dbfarm location. The MonetDB daemon process attached to the remote <dbfarm> will pick it up and makes it visible for connections. Finally, perform the following operation to continue processing user requests at the orginal server.

bash> monetdb release dbname

Query timing

Timing a query execution is supported in multiple ways, but largely depends on what you want to measure. Point-to-point wall clock time, or the actual behavior of kernel operations.

(1) The baseline is to use simple command line tools, such at TIME on Linux to assess the performance of running a script against mclient. Beware that /bin/time and /usr/bin/time are not the same, they mainly measure and report the wall-clock time spent by the given command/process. See their respective man pages for details.

(2) The next approach is to use the "--interactive" option of the mclient tool, which will report on the timing of each individual SQL query in a script in easy human consumable terms. It returns the wall-clock time between sending the query to the server and receiving the first block of answers. Its rendering can be controlled (see mclient).

(3) The query history can also be maintained in a separate log for post analysis. (see description)

Thus, (1) includes everything from loading the mclient binary and starting the mclient process, parsing the query in mclient, sending to the server, having the server execute the query and serialize the result, sending the result back to the client, to the client receiving, parsing and rendering the result, and sending the result to /dev/null ("for free"), to a file (I/O), or to a terminal (scrolling). (2) merely includes the time the server spends on receiving and executing the query and creating the result. The abovementioned costs on the client side to receive, parse, render, etc. the result are excluded

A detailed time of a SQL query can be obtained with prepending the query with the modifier TRACE. It will produce a queryable table with a break down of all relational algebra operations (see TRACE command). The offline tools stethoscope and tomograph provide further details for those interested in the inner working of the system. It provides a hook to many system parameters, e.g. input/output, cpu cycles, thread use...

Timing a database query should be done with care. Often you will notice differences in response time for the same query ran multiple times. The underlying cause can be that the data itself resides on disk (slow) or is already avaiable in the memory caches (fast), a single user runs queries (fast) or has to compete with other users (slow), including competing with other processes on your box fighting over cpu, memory, and IO resources. As a precaution you might want to flush the system caches. The Windows tool flushes the cache. You'll need to press the "Flush Cache WS" and "Flush All Standby" buttons. On Linux you have to create a little job that consumes all memory.

For more general information on running experiments and measuring time, see our performance tutorial.

Query History

The SQL implementation comes with a simple query profiler to detect expensive queries. It is centered around two predefined internal tables that stores information about all SQL queries definitions and their execution time. It is controlled by calling the procedure querylog.enable(), where after each  query compiled and added to the cache is also entered into the 'querylog.catalog' table. Query logging stops by calling procedure querylog.disable().

table sys.querylog_catalog(
   id oid,
   owner string,
   defined timestamp,
   query string,
   pipe string,
   optimize bigint -- time in microseconds
)

The query performance is stored in the table  'querylog_calls'. The owner of the query definition is also the one that will be referenced implicitly from the call events. The key timing attributes are 'run', i.e. the time to prepare  the result set , and 'ship', i.e. the time to render the result set  and sent it to the client. All timing in microseconds.

The remaining parameters illustrate the resource claims. The 'tuples' attribute denotes the size of the result set in number of rows. The 'space' depicts the total  size of all temporary columns created during query execution. Note, the space is allocated and freed during the query execution, leading to a much less demanding actual memory footprint. The 'cpu' load  is derived from the operating system system statistics (Linux only) and is given as a percentage. The same holds for the io waiting time.

table sys.querylog_calls (
    id oid, -- references query plan
    "start" timestamp,-- time the statement was started
    "stop" timestamp,-- time the statement was completely finished
    arguments string, -- actual call structure
    tuples wrd, -- number of tuples in the result set
    run bigint, -- time spent (in usec) until the result export
    ship bigint, -- time spent (in usec) to ship the result set
    cpu int, -- average cpu load percentage during execution
    io int, -- percentage time waiting for IO to finish
    space bigint -- total storage size of intermediates created (in MB)
)

create view sys.querylog_history as
    select qd.*, ql."start",ql."stop", ql.arguments, ql.tuples, ql.run, ql.ship, ql.cpu, ql.space, ql.io
    from sys.querylog_catalog() qd, sys.querylog_calls() ql
    where qd.id = ql.id and qd.owner = user;

The following code snippet illustrates its use.

sql>call sys.querylog_enable();
sql>select 1;
sql>select 1;
sql>select * from sys.querylog_catalog;
sql>select * from sys.querylog_calls;
sql>select * from sys.querylog_history;
sql>select id, query, avg(run) from sys.querylog_history group by id,query;
sql>call sys.querylog_disable();
sql>call sys.querylog_empty();

 

Storage Model

The storage footprint for any given database schema can be obtained by inspecting the table producing function storage().  To illustrate below we see the storage characteristics of the lineitem table in TPCH SF-1. The column width for variable length strings, e.g. l_shipinstruct and l_comment, represents the average length found using a sample over the underlying column storage.

sql>select * from storage() where "table" = 'lineitem';
+--------+----------+----------------------+---------+----------+---------+-----------+------------+-----------+---------+--------+
| schema | table    | column               | type    | location | count   | typewidth | columnsize | heapsize  | indices | sorted |
+========+==========+======================+=========+==========+=========+===========+============+===========+=========+========+
| sys    | lineitem | l_orderkey           | int     | 11/1175  | 6001215 |         4 |   24004860 |         0 |       0 | true   |
| sys    | lineitem | l_partkey            | int     | 11/1176  | 6001215 |         4 |   24004860 |         0 |       0 | false  |
| sys    | lineitem | l_suppkey            | int     | 11/1177  | 6001215 |         4 |   24004860 |         0 |       0 | false  |
| sys    | lineitem | l_linenumber         | int     | 12/1200  | 6001215 |         4 |   24004860 |         0 |       0 | false  |
| sys    | lineitem | l_quantity           | int     | 12/1201  | 6001215 |         4 |   24004860 |         0 |       0 | false  |
| sys    | lineitem | l_extendedprice      | decimal | 12/1202  | 6001215 |         8 |   48009720 |         0 |       0 | false  |
| sys    | lineitem | l_discount           | decimal | 12/1203  | 6001215 |         8 |   48009720 |         0 |       0 | false  |
| sys    | lineitem | l_tax                | decimal | 12/1204  | 6001215 |         8 |   48009720 |         0 |       0 | false  |
| sys    | lineitem | l_returnflag         | varchar | 12/1205  | 6001215 |         1 |    6001215 |     10240 |       0 | false  |
| sys    | lineitem | l_linestatus         | varchar | 12/1206  | 6001215 |         1 |    6001215 |     10240 |       0 | false  |
| sys    | lineitem | l_shipdate           | date    | 12/1207  | 6001215 |         4 |   24004860 |         0 |       0 | false  |
| sys    | lineitem | l_commitdate         | date    | 12/1210  | 6001215 |         4 |   24004860 |         0 |       0 | false  |
| sys    | lineitem | l_receiptdate        | date    | 12/1211  | 6001215 |         4 |   24004860 |         0 |       0 | false  |
| sys    | lineitem | l_shipinstruct       | varchar | 12/1212  | 6001215 |        11 |    6001215 |     10240 |       0 | false  |
| sys    | lineitem | l_shipmode           | varchar | 12/1213  | 6001215 |         4 |    6001215 |     10240 |       0 | false  |
| sys    | lineitem | l_comment            | varchar | 12/1214  | 6001215 |        26 |   24004860 | 268435456 |       0 | false  |
| sys    | lineitem | lineitem_orderkey    | oid     | 14/1472  | 6001215 |         8 |   48009720 |         0 |       0 | false  |
| sys    | lineitem | lineitem_partkey     | oid     | 14/1445  | 6001215 |         8 |   48009720 |         0 |       0 | false  |
| sys    | lineitem | lineitem_suppkey     | oid     | 14/1461  | 6001215 |         8 |   48009720 |         0 |       0 | false  |
| sys    | lineitem | lineitem_partsuppkey | oid     | 15/1503  | 6001215 |         8 |   48009720 |         0 |       0 | false  |
+--------+----------+----------------------+---------+----------+---------+-----------+------------+-----------+---------+--------+

 

To estimate the storage footprint for a variation of this scheme, we first construct a relation with the model input. Thereafter it can be updated to reflect the expected database size and varwidth properties. The footprint then becomes available as table producing function storagemodel(). Also be aware that dictionary encoding may have taken place, which leads to a much less storage footprint (see l_shipinstruct). Dictionary encoding is currently a runtime feature,  it depends on insertion sequence and the dictionary size. It may lead to an overshoot in the estimated size. Therefore, the size of varchar columns should be taken with a grain of salt.

sql>call storagemodelinit();
sql>update storagemodelinput set  count = 1000000 where "table"='lineitem';
sql>update storagemodelinput set  "distinct" = 1000 where "table"='lineitem' and "type"='varchar';
sql>update storagemodelinput set  "distinct" = 330000 where "table"='lineitem' and "column"='l_comment';
sql>select * from storagemodel() where "table" = 'lineitem';

+--------+----------+----------------------+---------+---------+-----------+------------+----------+---------+--------+
| schema | table    | column               | type    | count   | typewidth | columnsize | heapsize | indices | sorted |
+========+==========+======================+=========+=========+===========+============+==========+=========+========+
| sys    | lineitem | l_orderkey           | int     | 1000000 | 4         |    4000000 |        0 | 8000000 | true   |
| sys    | lineitem | l_partkey            | int     | 1000000 | 4         |    4000000 |        0 | 8000000 | false  |
| sys    | lineitem | l_suppkey            | int     | 1000000 | 4         |    4000000 |        0 | 8000000 | false  |
| sys    | lineitem | l_linenumber         | int     | 1000000 | 4         |    4000000 |        0 |       0 | false  |
| sys    | lineitem | l_quantity           | int     | 1000000 | 4         |    4000000 |        0 |       0 | false  |
| sys    | lineitem | l_extendedprice      | decimal | 1000000 | 8         |    8000000 |        0 |       0 | false  |
| sys    | lineitem | l_discount           | decimal | 1000000 | 8         |    8000000 |        0 |       0 | false  |
| sys    | lineitem | l_tax                | decimal | 1000000 | 8         |    8000000 |        0 |       0 | false  |
| sys    | lineitem | l_returnflag         | varchar | 1000000 | 1         |    1000000 |    11240 |       0 | false  |
| sys    | lineitem | l_linestatus         | varchar | 1000000 | 1         |    1000000 |    11240 |       0 | false  |
| sys    | lineitem | l_shipdate           | date    | 1000000 | 4         |    8000000 |        0 |       0 | false  |
| sys    | lineitem | l_commitdate         | date    | 1000000 | 4         |    8000000 |        0 |       0 | false  |
| sys    | lineitem | l_receiptdate        | date    | 1000000 | 4         |    8000000 |        0 |       0 | false  |
| sys    | lineitem | l_shipinstruct       | varchar | 1000000 | 11        |    1000000 |    21240 |       0 | false  |
| sys    | lineitem | l_shipmode           | varchar | 1000000 | 4         |    1000000 |    14240 |       0 | false  |
| sys    | lineitem | l_comment            | varchar | 1000000 | 26        |    1000000 |  8590240 |       0 | false  |
| sys    | lineitem | lineitem_orderkey    | oid     | 1000000 | 8         |    8000000 |        0 |       0 | false  |
| sys    | lineitem | lineitem_partkey     | oid     | 1000000 | 8         |    8000000 |        0 |       0 | false  |
| sys    | lineitem | lineitem_suppkey     | oid     | 1000000 | 8         |    8000000 |        0 |       0 | false  |
| sys    | lineitem | lineitem_partsuppkey | oid     | 1000000 | 8         |    8000000 |        0 |       0 | false  |
+--------+----------+----------------------+---------+---------+-----------+------------+----------+---------+--------+

 

System monitor

The MonetDB kernel maintains an active queue of all running queries. This queue is available for all users to inspect the status of his own queries. The system administrator can inspect it to overlook the complete workload on the system. The queue is made visible as a table producing function, called sys.queue().

sql>select * from sys.queue();
+------+---------+----------------------------+----------+----------+---------+------------+----------------------------+
| qtag | user    | started                    | estimate | progress | status  | tag        | query                      |
+======+=========+============================+==========+==========+=========+============+============================+
|  902 | monetdb | 2013-03-30 10:23:39.000000 | null     |     null | running | 15168688@0 | select * from sys.queue(); |
+------+---------+----------------------------+----------+----------+---------+------------+----------------------------+
1 tuple (0.446ms)

The schema structure is largely self-explanatory. If the query (template) is ran multiple times, then the system can derive a progress indicator and calculate an estimated time of completion. The 'tag' column references the query log tables, provided this facility has been turned on. The initial column 'qtag' provides a key to each active query .

NOTE that PAUSE and STOP takes effect at the first safe point within the query plan, which often is after the current MAL instruction has been finished. For complex queries over large database this may take seconds up to minutes.

sql> select * from sys.queue;
+------+---------+----------------------------+----------------------------+----------+---------+-----------+--------------------------+
| qtag | user    | started                    | estimate                   | progress | status  | tag       | query                    |
+======+=========+============================+============================+==========+=========+===========+==========================+
|   52 | monetdb | 2013-12-23 12:34:50.000000 | null                       |     null | running | 1349763@0 | select * from sys.queue; |
+------+---------+----------------------------+----------------------------+----------+---------+-----------+--------------------------+
1 tuple (2.242ms)

--- start a query in another window
sql>select * from sys.queue;
+------+---------+----------------------------+----------------------------+------+---------+-----------+-------------------------------------------------------------+
| qtag | user    | started                    | estimate                   | prog | status  | tag       | query                                                       |
:      :         :                            :                            : ress :         :           :                                                             :
+======+=========+============================+============================+======+=========+===========+=============================================================+
|   56 | monetdb | 2013-12-23 12:35:03.000000 | null                       | null | running | 1353354@0 | select                                                      |
:      :         :                            :                            :      :         :           :  sum(l_extendedprice * l_discount) as revenue               :
:      :         :                            :                            :      :         :           : from                                                        :
:      :         :                            :                            :      :         :           :  lineitem                                                   :
:      :         :                            :                            :      :         :           : where                                                       :
:      :         :                            :                            :      :         :           :  l_shipdate >= date \'1994-01-01\'                          :
:      :         :                            :                            :      :         :           :  and l_shipdate < date \'1994-01-01\' + interval \'1\' year :
:      :         :                            :                            :      :         :           :  and l_discount between 0.06 - 0.01 and 0.06 + 0.01         :
:      :         :                            :                            :      :         :           :  and l_quantity < 24;                                       :
|   57 | monetdb | 2013-12-23 12:35:10.000000 | null                       | null | running | 1349763@0 | select * from sys.queue;                                    |
+------+---------+----------------------------+----------------------------+------+---------+-----------+-------------------------------------------------------------+
2 tuples (1.404ms)
sql>call sys.pause(56);
------+---------+----------------------------+----------------------------+------+---------+-----------+-------------------------------------------------------------+
| qtag | user    | started                    | estimate                   | prog | status  | tag       | query                                                       |
:      :         :                            :                            : ress :         :           :                                                             :
+======+=========+============================+============================+======+=========+===========+=============================================================+
|   56 | monetdb | 2013-12-23 12:35:03.000000 | null                       | null | paused  | 1353354@0 | select                                                      |
:      :         :                            :                            :      :         :           :  sum(l_extendedprice * l_discount) as revenue               :
:      :         :                            :                            :      :         :           : from                                                        :
:      :         :                            :                            :      :         :           :  lineitem                                                   :
:      :         :                            :                            :      :         :           : where                                                       :
:      :         :                            :                            :      :         :           :  l_shipdate >= date \'1994-01-01\'                          :
:      :         :                            :                            :      :         :           :  and l_shipdate < date \'1994-01-01\' + interval \'1\' year :
:      :         :                            :                            :      :         :           :  and l_discount between 0.06 - 0.01 and 0.06 + 0.01         :
:      :         :                            :                            :      :         :           :  and l_quantity < 24;                                       :
|   65 | monetdb | 2013-12-23 12:36:03.000000 | null                       | null | running | 1349763@0 | select * from sys.queue;                                    |
+------+---------+----------------------------+----------------------------+------+---------+-----------+-------------------------------------------------------------+
2 tuples (3.082ms)
sql>call sys.stop(56);
sql>select * from sys.queue;
+------+---------+----------------------------+----------------------------+----------+---------+-----------+--------------------------+
| qtag | user    | started                    | estimate                   | progress | status  | tag       | query                    |
+======+=========+============================+============================+==========+=========+===========+==========================+
|   67 | monetdb | 2013-12-23 12:36:22.000000 | null                       |     null | running | 1349763@0 | select * from sys.queue; |
+------+---------+----------------------------+----------------------------+----------+---------+-----------+--------------------------+
1 tuple (2.799ms)