Recipes book

Recipes book mk Tue, 03/16/2010 - 20:32

This section contains essays to simplify the use and deployment of SQL on the MonetDB platform. The topics are inspired by ongoing projects or specific user requests.

Client Interfaces

Client Interfaces mk Sun, 03/28/2010 - 23:14

Graphical User Interfaces Graphical database utilities have been around for some time to support click-and-go interactions with a DBMS. They often support multiple database back-ends using a standard communication protocol, e.g. JDBC. MonetDB/JDBC makes such tools work flawlessly with the server and make your experience with MonetDB/SQL even better. Recommended GUIs are Squirrel and DBvisualizer.

The foundation for building your own applications rest on the programming language bindings provided.

MAPI Client

MAPI Client mk Sun, 03/28/2010 - 23:14

The mclient program is the universal command-line tool that implements the MAPI protocol for client-server interaction with MonetDB.

On a Windows platform it can be started using start->MonetDB->MonetDB SQL Client. Alternatively, you can use the command window to start mclient.exe. Be aware that your environment variables are properly set to find the libraries of interest.

On a Linux platform it provides readline functionality, which greatly improves user interaction. A history can be maintained to ease interaction over multiple sessions.

The default setting is geared at establishing a guest connection to an SQL database at a default server running on the localhost. The -h hostname specifies on which machine the MonetDB server is running. If you communicate with a MonetDB server on the same machine, it can be omitted.

     Usage: mclient [ options ]

     Options are:
      -h hostname | --host=hostname    host or UNIX domain socket to connect to
      -p portnr   | --port=portnr      port to connect to
      -u user     | --user=user        user id
      -d database | --database=database  database to connect to
      -e          | --echo             echo the query
      -E charset  | --encoding=charset specify encoding (character set) of the terminal
      -f kind     | --format=kind      specify output format {csv,tab,raw,sql,xml}
      -H          | --history          load/save cmdline history (default off)
      -i          | --interactive      read stdin after command line args
      -l language | --language=lang    {sql,mal}
      -L logfile  | --log=logfile      save client/server interaction
      -s stmt     | --statement=stmt   run single statement
      -X          | --Xdebug           trace mapi network interaction
      -| cmd      | --pager=cmd        for pagination
      -?          | --help             show this usage message

     SQL specific opions 
      -n nullstr  | --null=nullstr     change NULL representation for sql, csv and tab output modes
      -r nr       | --rows=nr          for pagination
      -w nr       | --width=nr         for pagination
      -D          | --dump             create an SQL dump
      -N          | --inserts          use INSERT INTO statements when dumping

The default mapi_port TCP port used is 50000. If this port happens to be in use on the server machine (which generally is only the case if you run two MonetDB servers on it), you will have to use the -p port do define the port to which the mserver is listening. Otherwise, it may also be omitted. If there is more than one mserver running, you must also specify the database name -d database. In this case, if your port is set to the wrong database, the connection will always be redirected to the correct one. Note that the default port (and other default options) can be set in the server configuration file.

Within the context of each query language there are more options. They can be shown using the command \? or using the commandline.

For SQL there are several knobs to tune for a better rendering of result tables (\w).

     shell> mclient -d database
     Welcome to mclient, the MonetDB/SQL interactive terminal (Apr2011-SP2)
     Database: MonetDB v11.5.2, 'database'
     Type \q to quit, \? for a list of available commands
     auto commit mode: on
     sql>\?
     \?      - show this message
     \<file  - read input from file
     \>file  - save response in file, or stdout if no file is given
     \|cmd   - pipe result to process, or stop when no command is given
     \h      - show the readline history
     \t      - toggle timer
     \D table- dumps the table, or the complete database if none given.
     \d[Stvsfn]+ [obj] - list database objects, or describe if obj given
     \A      - enable auto commit
     \a      - disable auto commit
     \e      - echo the query in sql formatting mode
     \f      - format using a built-in renderer {csv,tab,raw,sql,xml}
     \w#     - set maximal page width (-1=unlimited, 0=terminal width, >0=limit to num)
     \r#     - set maximum rows per page (-1=raw)
     \L file - save client/server interaction
     \X      - trace mclient code
     \q      - terminate session

JDBC Client

JDBC Client mk Sun, 03/28/2010 - 23:15

Note: the jdbcclient tool is only provided for performing basic JDBC tests.  It is not supported, mclient should be used instead.

The textual client using the JDBC protocol comes with several options to fine-tune the interaction with the database server. A synopsis of the calling arguments is given below

     java -jar ${prefix}/share/monetdb/lib/jdbcclient.jar  \
             [-h host[:port]] [-p port] \
     		[-f file] [-u user] [-l language] [-d [database]] \
     		[-D [table]] [-e] [-X<opt>]

or using long option equivalents –host –port –file –user –language –dump –echo –database. Arguments may be written directly after the option like -p50000.

If no host and port are given, localhost and 50000 are assumed. An .monetdb file may exist in the user's home directory. This file can contain preferences to use each time the program is started. Options given on the command line override the preferences file. The .monetdb file syntax is <option>=<value> where option is one of the options host, port, file, mode debug, or password. Note that the last one is perilous and therefore not available as command line option. If no input file is given using the -f flag, an interactive session is started on the terminal.

NOTE The JDBC protocol does not support the SQL DEBUG <query>, option. Use the mclient tool instead.

 

OPTIONS

-h --host

The hostname of the host that runs the MonetDB database. A port number can be supplied by use of a colon, i.e. -h somehost:12345.

-p --port

The port number to connect to.

-f --file

A file name to use either for reading or writing. The file will be used for writing when dump mode is used (-D –dump). In read mode, the file can also be an URL pointing to a plain text file that is optionally gzip compressed.

-u --user

The username to use when connecting to the database.

-d --database

Try to connect to the given database (only makes sense if connecting to monetdbd).

-l --language

Use the given language, defaults to 'sql'.

--help

This screen.

--version

Display driver version and exit.

-e --echo

Also outputs the contents of the input file, if any.

-q --quiet

Suppress printing the welcome header.

-D --dump

Dumps the given table(s), or the complete database if none given.

EXTRA OPTIONS

-Xdebug

Writes a transmission log to disk for debugging purposes. If a file name is given, it is used, otherwise a file called monet<timestamp>.log is created. A given file will never be overwritten; instead a unique variation of the file is used.

-Xhash

Use the given hash algorithm during challenge response. Supported algorithm names: SHA1, MD5, plain.

-Xoutput

The output mode when dumping. Default is sql, xml may be used for an experimental XML output.

-Xbatching

Indicates that a batch should be used instead of direct communication with the server for each statement. If a number is given, it is used as batch size. I.e. 8000 would execute the contents on the batch after each 8000 read rows. Batching can greatly speedup the process of restoring a database dump.

 

DbVisualizer

DbVisualizer mk Sun, 03/28/2010 - 23:17

DbVisualizer is a platform independent tool aimed to simplify database development and management for database administrators and developers. It's a very cool tool (can even draw dependency graphs based on the schema and foreign keys.

Free, personal use versions are available from their website. Download and install the software. The following scheme works for their versions 6.1.

After starting DbVisualizer for the first time, it will load its default welcome screen and starts the wizard to select database driver. Cancel this wizard and open the Driver Manager, using the menu Tools->Driver Manager... In the Driver Manager add a new Driver using Driver->Create Driver... Type the name of the driver in the Name field, e.g. MonetDB. Type the URL format for the driver: jdbc:monetdb://hostname/database. In the Driver File Paths box, follow the directions to load a JAR file that contains the JDBC driver. Use the MonetDB JDBC driver that came with your distribution, typically installed in ${prefix }/share/MonetDB/lib/monetdb-X.Y-jdbc.jar.

After adding, the window should list the driver class nl.cwi.monetdb.jdbc.MonetDriver and automatically fills it in in the Driver Class field. Close the Driver Manager window.

From the menu select Database->Create Database Connection. A pop-up dialog will try to pursue you to use the wizard. You know better so, click "No". Fill in the name for the connection in the Alias field, e.g. MonetDB. Select the monetdb driver from the list. Copy the default URL by clicking on the "URL Format: ..." text field and change it to reflect the right hostname (usually localhost will do). Fill in the default userid and password (monetdb). Press the "Connect" button. It will report the database being used and the JDBC driver in use. In the left pane the monetdb database now becomes available from browsing.

Explore the application and have fun!

A caveat of the free-version system is its performance on SQL scripts. They are sent as a single string to the server for execution. This is not the most optimal situation for MonetDB. Running a batch script is better started from the MonetDB SQL client.

Pentaho

Pentaho dnedev Wed, 09/24/2014 - 14:30

The Pentaho suite is comprised of several application for Data Integration, Business Analytics and Reporting. It comes in both free Community edition and Enterprise one. Written (mostly) in Java, the Pentaho suite has a great support for MonetDB. There are a few tools in the suite directly useful to MonetDB users:

  • ETL: Pentaho Data Integration (PDI), or formerly know as Kettle, is a powerful tool with numerous versatile components for data Extract, Transform and Load. PDI ships with a bulk-loader for MonetDB and JDBC driver for reading data from MonetDB
  • Business Analytics: Pentaho Interactive Reporting and Dashboard Designer are tools useful for (as their names suggest) reporting and designing interactive web dashboards. 

All of the above tools plus other are available as independently downloadable community-supported applications, as well as integrated in the Pentaho Business Analytics suite.

To read or write data from/to MonetDB in PDI, one needs to create a database connection, which is established with the MonetDB JDBC driver that ships with PDI. To create a new connection, start PDI create/open a job or transformation.

1) Select Tools -> Wizard -> Create database connection.

2) Name the connection (e.g. MonetDB), select MonetDB from the database list and Native (JDBC) from the access type list.

3) Enter hostname (e.g. localhost), port (50000 by default) and your database name.

4) Enter the username and password for your database.

You can click on the Test database connection button to verify it works and then close the dialog.

In a PDI transformation you can use the MonetDB Bulk Loader to quickly load your data processed the tool. Do use the bulk-loading component, add it to your transformation, connect it to your stream and configure it to use the pre-configured connection from the Database connection drop down menu. Next, specify in which table the data should be loaded and which fields from the PDI data stream should be written.

1) Type in the Target schema name (sys by default).

2) Type in or browse to select the Target table.

3) Open the Output Fields tab and select the Incoming Stream fields that should be loaded in MonetDB.

A few notes:

- The first column (Target Table fields) is not used for mapping the data stream fields to table columns, instead only the order in which the Incoming Stream fields are arranged is taken into account. In other words if a field called 'name' is arranged as the first incoming field, it will be mapped to the first column of the target table.

- If the target table does not exist, PDI will attempt to create it, inferring the data types from these of the type of the incoming data field. Due to some mismatch between data types we recommend creating the table beforehand. Clicking on the SQL button on dialog will display the SQL statement used for creating the table, including the data types.

ReportServer

ReportServer zhang Fri, 12/30/2016 - 17:26

ReportServer is a business intelligence platform that offers various reporting and visualisation tools. It comes both in a free Community edition and an Enterprise edition. More information can be found from their website. The tool most useful to MonetDB users is probably ReportServer's Dynamic List (its ad-hoc reporting component), which allows you to easily transform large MonetDB data tables into meaningful reports to be exported, for example, to Excel.

To set up ReportServer to talk to your MonetDB instance you will need to register the MonetDB JDBC driver. To register the JDBC driver with ReportServer you need to place the monetdb-jdbc-X.Y.jar (where X and Y are major and minor version numbers) into ReportServer's lib directory. If you installed ReportServer via one of the installers then the lib directory will be located in INSTALL_DIR/apps/reportserver/reportserver-conf/lib. If you did a manual installation then the lib directory will be /opt/reportserver/lib (on UNIX systems), or C:\Program Files\reportserver\lib on Windows. Once the driver is in the correct location, restart ReportServer.

In the following we quickly go through the basic steps to set up a Dynamic List on top of a MonetDB database table. Basically, you will need to do two things for this:

1) Create a datasource in ReportServer that points to your MonetDB installation
2) Create a Dynamic List based on that datasource.

 

To create a datasource for MonetDB go to the Administration Module and then to Datasources. In the datasource tree right click on a folder and select Insert > Relational Database. In the "Edit datasource" window, provide a Name for the datasource and select MonetDB as the Database (if the JDBC driver was not put into the right location, the MonetDB item will state "driver missing"). Username and Password are the username and password of the MonetDB database user with which you would like to connect to your database. Finally, the URL should be in the form jdbc:monetdb://HOST:PORT/DATABASE_NAME. For example, if your MonetDB instance is running on the same machine and on its default port, and your database is named voc, then the following URL could be used: jdbc:monetdb://localhost:50000/voc.

Save your changes by clicking the "Apply" button. To test whether the stored changes allow you to properly connect you can use the "Test Connection" button from the top toolbar (remember to always first save the changes before testing).

Now that we have a database, let us create a Dynamic List. For this we go to Administration > Reports and right click on a folder within the report tree to select Insert > Dynamic List. The basic configuration of a Dynamic List only needs a name, a database connection and a base query. That is, once you select your previously created MonetDB datasource, ReportServer will ask you to provide a query. This query forms the base of the Dynamic List. You can use any SQL statement that can be used within an inner statement. In case you simply want to base a report on a single data table, say, voc.passengers from the voc database, the query would be: SELECT * FROM passengers.

Once you've configured the Dynamic List you can start using it by simply double clicking the item in the tree. 

Here is an introduction to the Dynamic List. Documentation of ReportServer is available here.

SQL Workbench/J

SQL Workbench/J dnedev Thu, 09/18/2014 - 14:38

SQL Workbench/J is a tools for working SQL databases written in Java. You can downloaded it from the official website.

JDBC Driver

While SQL Workbench has some support for MonetDB, you will still need to download JDBC driver and quickly configure it. After starting SQL Workbench for the first time, it will load its default welcome screen and starts the wizard to select database driver. Cancel this wizard and open the Driver manager, using the menu File->Manage Drivers.

1) There add a new Driver clicking on the button in the top left corner of the dialog and type in a name for the driver, e.g. MonetDB.

2) Click on the folder-like icon and navigate to the previously downloaded JDBC .jar file.

3) The class name will be filled on it's own, but just in case it should be: nl.cwi.monetdb.jdbc.MonetDriver.

4) Type in a sample URL for database connection, e.g. jdbc:monetdb://hostname:50000/database.

Database Connection

Next, to connect to a MonetDB database instance open File->Connect window

1) Type in a name of the connection.

2) Selected the MonetDB driver from the drop-down menu.

3) Enter the details of your database connection, replacing the hostname, port and database name in the sample URL, e.g. jdbc:monetdb://localhost:50000/db for an instance running on localhost, port 50000 with database db.

4) Enter the username and password for the database

Confirm the connection and you are done.

SQuirreL

SQuirreL mk Sun, 03/28/2010 - 23:16

SQuirreL SQL Client is a graphical Java program to view the structure of a JDBC compliant database, browse the data in tables, issue SQL commands, etc. It is a very cool tool. It can even draw dependency graphs based on the schema and foreign keys. The latest versions is available from their website. Download and install the software. The following scheme works for version 3.4.

After starting SQuirreL for the first time, it will load its default welcome screen. Locate the Drivers window and click to add a driver.

SQuirreL_MonetDB_driver_dialog

In the pop up form enter the driver name MonetDB, an example URL pointing to the database of interest jdbc:monetdb://<host>[:<port>]/<database>, the website https://www.monetdb.org/Home. Now goto Extra Class Path and add the MonetDB JDBC driver jar file. The latest version can be downloaded from the MonetDB repository or  came with your distribution, typically installed in ${prefix}/share/monetdb/lib/monetdb-jdbc-X.Y.jar. Next, click on the List Drivers button and it will insert nl.cwi.monetdb.jdbc.MonetDriver to the Class Name field. Click OK.

Now create a new Alias to setup a connection. You will have to edit the alias to direct it to the proper database server (change URL) and use the proper user credentials. Explore the application and have fun!

Tableau

Tableau dnedev Wed, 09/24/2014 - 11:19

Tableau is a great data visualization and business intelligence software. Using the MonetDB ODBC driver, you can easily access your data. You can find more information about Tableau on their website.

Note: Currently only the Windows version of Tableau Desktop support ODBC connections. Tableau Public and all OS X version do not support ODBC connection.

To configure a connection to a MonetDB database, download and install Tableau, followed by the ODBC driver (download here). Be careful to download the appropriate version (32 or 64-bit) based on your system - 64-bit Tableau will only work with a 64-bit ODBC driver on a 64-bit Windows.

Next, (after having started the your database instance and loaded your data), open Tableau to configure the connection. From the welcome screen select Connect to data, scroll to the bottom of the On a server list and select Other Databases (ODBC). After that fill in the required details for connection to the MonetDB server:

1) Select DSN and find MonetDB from the drop down menu

2) Click connect to initialize the driver. If no system-wide DSN has been configured, you can now enter the details for database connection.

3) Type in the hostname of the server - localhost in our case.

4) Type in the port of the database server - 50000 by default.

5) Enter the database name - demo is the default database on Windows MonetDB installations.

6) Type in the username and password for the database.

Click OK and Tableau will establish a connection with the database instance.

Next, to get ready for data exploration or reporting you need to select which tables should be loaded in a Tableau worksheet.

1) Open the Schema drop down menu, click on the search button to list all schemas and select that one with your data tables - sys by default.

2) Open the Table drop down menu and click on the search button to list all to list all schemas. Selecting more than one table will usually bring up a menu to join the rows in a single query. Alternatively you can write you own Custom SQL query.

3) Select the table(s) you want to be added for visualisation. Depending on the data size, you might want to enable Automatically Update the table contents.

4) Click on the Go to Worksheet button.

That's it, you can now explore and visualise the data with Tableau.

You can also find more about Tableau and ODBC connection on their website.

User Defined Functions

User Defined Functions mk Thu, 04/01/2010 - 22:16

An open source solution provides a stepping stone for others to extend its kernel functionality with specific types and functions. Experience shows that the need for those are fairly limited. Often the use of the built-in data types, the MAL algebra and functional abstraction, provide the necessary toolkit to achieve your goal.

In the few cases where the MonetDB kernel and SQL runtime system needs extensions, it calls for access to the source code of MonetDB and proficiency in C-programming, compilation and debugging. The openess of MonetDB means that extensions are not sand-boxed; they run within the system address space. Moreover, the multi-layered architecture means you have to make the functions written in C known to the MAL interpreter, before they can be made known to the  SQL compiler. The current setup makes this a little more cumbersome, but the added benefit is that both simple scalar functions and columnar operations can be introduced.

In this section we show how to extend SQL with a simple scalar function to reverse a string, i.e.

sql> select 'hello',reverse('hello');
+---------------+
| hello | olleh |
+---------------+

step 1. You should access and be able to compile and install MonetDB in a private directory.

step 2. Go to the sql/backends/monet5/UDF directory from the sources top directory. It provides the reverse example as a template. A group of user-defined functions is assembled in a directory like UDF. It contains files that described the SQL signature, the MAL signature, and the C-code implementation.

step 3. Extension starts with a definitin of the MAL signatures. See the example given, or browse through the files in monetdb5/modules/mal/*.mal to get a glimpse on how to write them. The MonetDB kernel documentation provides more details.  The file contains the MAL snippet:
command reverse(ra1:str):str
address UDFreverse
comment "Reverse a string";

step 4. The signature says that it expects a command body implementation under the name UDFreverse, shown below. The C-signature is a direct mapping, where arguments are passed by reference and the return value(s)  references are the first in the arguments list. The body should return a (malloced) string to denote an exception being raised or MAL_SUCCEED upon access.
#include "udf.h"

static str
reverse(const char *src)
{
    size_t len;
    str ret, new;

    /* The scalar function returns the new space */
    len = strlen(src);
    ret = new = GDKmalloc(len + 1);
    if (new == NULL)
        return NULL;
    new[len] = 0;
    while (len > 0)
        *new++ = src[--len];
    return ret;
}

str
UDFreverse(str *ret, str *src)
{
    if (*src == 0 || strcmp(*src, str_nil) == 0)
        *ret = GDKstrdup(str_nil);
    else
        *ret = reverse(*src);
    return MAL_SUCCEED;
}

step 5. The next step is to administer the routine in the SQL catalog. This calls for a SQL statement to be executed once for each database. The autoload method can relieve you from loading the modules manually in the server after each restart. The UDF template contains the file 80_udf.sql and 80_udf.mal. The former contains the definition needed for SQL:
create function reverse(src string)
returns string external name udf.reverse;

step 6. The MAL interpreter should be informed about the linked in functionality. This is faciliated using an autoload feature too. The MAL script  simply contains the module signature.

include udf;

step 7. After all pieces are prepared, you have to call the bootstrap program in the root of your checked out source tree once. Thereafter a configure/make/make install attempts compilation and places the interface files and libraries in the proper place.

Creation of bulk  and polymorphmic operations require much more care. In general, it is best to find an instruction that is already close to what you need. Clone it, expand it, compile it, and test it.  A bulk variation of the reverse operation is included in the sample UDF template. As a last resort you can contact us on the mailing lists for further advice.
 

Cluster Management

Cluster Management mk Thu, 03/18/2010 - 17:55

Topic How to set up a cluster of three MonetDB servers belonging to the same group.

Building a distributed system using the MonetDB infrastructure requires a number of machines that run monetdbd (see its man-page for details). Assume we have three worker machines, called cruquius, lijnden and leeghwater. Provided MonetDB/SQL has been compiled and installed on each of the workers, perform the following steps on each worker to get monetdbd running:

worker% monetdbd create /fullpath/to/mydbfarm
worker% monetdbd set passphrase=mypassphrase /fullpath/to/mydbfarm
worker% monetdbd start /fullpath/to/mydbfarm

This creates the location for the database on each worker and sets the private control passphrase for the MonetDB daemon to "mypassphrase". The final step is to activate the daemon itself. This is basically all you need to initialise a cluster of MonetDB instances.

Assume you have likewise initialized MonetDB on your desktop, named willem, from which you will control the MonetDB instances. On the desktop machine you can use the following command to show which machines guarded with monetdbd can be reached:

willem% monetdb discover

Amongst others it will show you the workers you just initialized. From our machine willem we can further work with the cluster by addition of a database called pump on each node as follows:

willem% monetdb -h cruquius -P mypassphrase create pump
willem% monetdb -h lijnden -P mypassphrase create pump
willem% monetdb -h leeghwater -P mypassphrase create pump

The discover command will not show them yet, as they are still in maintenance mode.

Beware, these three identically named databases are not related to each other in any way, i.e. access control and database content can be different for each individual database pump. They are logical names to be refined using role tags to designate their function. Consider we use two server instances to act as harvey[1,2] and one as fox. How these functionalities behave is beyond the control of the MonetDB daemon. It suffices to setup the shared property tag to designate them all to belong to the same group.

willem% monetdb -h cruquius -P mypassphrase set shared=pump/2/harvey pump
willem% monetdb -h lijnden -P mypassphrase set shared=pump/2/fox pump
willem% monetdb -h leeghwater -P mypassphrase set shared=pump/1/harvey pump

Now that we have set them up, they can be made available for use:

willem% monetdb -h cruquius -P mypassphrase release pump
willem% monetdb -h lijnden -P mypassphrase release pump
willem% monetdb -h leeghwater -P mypassphrase release pump

An mclientconnection can now go to any of the hosts directly, or use a pattern to find the right machine. Given the example above with our tags, for instance the following two mclient calls connect to lijndenand round-robin to cruquius and leeghwater respectively:

willem% mclient -d '*/pump/2'
willem% mclient -d '*/pump/*/harvey'

Data Partitioning

Data Partitioning zhang Sun, 07/19/2015 - 16:48

The prime method to enable finer control of locality of data access during query evaluation is to horizontally partition the tables in a database. This is made possibly by introducing the concept of MERGE TABLE, which allows a table to be defined as the union of its partitions. Running the ANALYZE command over a MERGE TABLE aids the optimizer to weed out all partitions that can not contribute to a query result using the min/max values available.

Creating MERGE TABLE-s

The SQL statements below show how to create a MERGE TABLE mt containing two partitions t1 and t2. Then the partition tables can be queried both individually and jointly.

    -- Create partition tables with identical column types, and add some values.
    -- The column names are not significant.
    CREATE TABLE t1 (i int);
    CREATE TABLE t2 (j int);
    INSERT INTO t1 VALUES (11), (13);

    INSERT INTO t2 VALUES (23), (27);

    -- Create a MERGE TABLE
    CREATE MERGE TABLE mt1 (t int);

    -- Add the partition tables into the MERGE TABLE
    ALTER TABLE mt1 ADD TABLE t1;
    ALTER TABLE mt1 ADD TABLE t2;

    sql> -- sanity check
    sql> SELECT count(*) FROM t1;
    +------+
    | L1   |
    +======+
    |    2 |
    +------+
    1 tuple (2.231ms)
    sql> SELECT count(*) FROM t2;
    +------+
    | L1   |
    +======+
    |    2 |

    +------+
    1 tuple (2.226ms)
    sql> SELECT count(*) FROM mt1;
    +------+
    | L1   |
    +======+
    |    4 |
    +------+
    1 tuple (3.455ms)

Two MERGE TABLE can contain overlapping partition tables. A MERGE TABLE can also contain other MERGE TABLE-s:

    CREATE TABLE t3 (k int);
    INSERT INTO t3 VALUES (31), (37);

    CREATE TABLE t4 (l int);
    INSERT INTO t4 VALUES (41), (47);

    -- An overlapping MERGE TABLE (with mt1)
    CREATE MERGE TABLE mt2 (t int);

    ALTER TABLE mt2 ADD TABLE t1;
    ALTER TABLE mt2 ADD TABLE t3;

    -- A MERGE TABLE of MERGE TABLE
    CREATE MERGE TABLE mt3 (t int);
    ALTER TABLE mt3 ADD TABLE mt1;

    ALTER TABLE mt3 ADD TABLE t4;

    sql> -- sanity check
    sql> SELECT * FROM mt2;

    +------+
    | t    |
    +======+
    |   11 |
    |   13 |
    |   31 |
    |   37 |
    +------+
    4 tuples (2.677ms)
    sql> SELECT * FROM mt3;
    +------+
    | t    |
    +======+
    |   11 |
    |   13 |
    |   23 |
    |   27 |
    |   41 |
    |   47 |
    +------+
    6 tuples (2.496ms)

Updating MERGE TABLE-s

Currently, a MERGE TABLE cannot be directly updated. All updates must be done on the individual partition tables:

    sql> -- All updates must go directly into the individual partition table
    sql> INSERT INTO mt1 VALUES(1);
    INSERT INTO: cannot insert into merge table 'mt1'
    sql> UPDATE mt1 SET t=10 where t=11;
    UPDATE: cannot update merge table 'mt1'
    sql> UPDATE t1 SET i=10 where i=11;
    1 affected row (4.081ms)

One can remove a partition table from a MERGE TABLE using an ALTER TABLE statement. This also enables dropping a partition table, while keep the merge table.

    -- Remove a partition table from a MERGE TABLE
    ALTER TABLE mt1 DROP TABLE t1;

    sql> -- sanity check
    sql> SELECT count(*) FROM mt1;
    +------+
    | L1   |
    +======+
    |    2 |
    +------+
    1 tuple (1.455ms)

When dropping a partition table, all MERGE TABLE-s that depend on it must be dropped first:

    sql> -- Drop table must happen in the correct order of dependency
    sql> DROP TABLE t2;
    DROP TABLE: unable to drop table t2 (there are database objects which depend on it)
    sql> DROP TABLE mt1;
    DROP TABLE: unable to drop table mt1 (there are database objects which depend on it)
    sql> DROP TABLE mt3;
    operation successful (3.048ms)
    sql> DROP TABLE mt1;
    operation successful (1.948ms)
    sql> DROP TABLE t2;
    operation successful (2.329ms)

Distributed Query Processing

Distributed Query Processing zhang Sun, 07/19/2015 - 17:43

Distributed query processing has been introduced since the Jul2015 release of MonetDB through the support for remote tables. The remote table technique complements merge table by allowing the partitions of a merge table to reside on different databases. Queries involving remote tables are automatically split into subqueries by the master database and executed on remote databases. The combination of merge table and remote table enables fine control of how to distribute data and query workloads to maximally benefit from the available CPU and RAM resources.

Setting up REMOTE TABLE-s

Remote table adopts a straightforward master-worker architecture: one can place the partition tables in different databases, each served by a worker MonetDB server, and then glue everything together in a MERGE TABLE in the master database served by the master MonetDB server. Each MonetDB server can act as both a worker and a master, depending on the roles of the tables it serves.

The following shell commands and SQL queries show how to place the partition tables t1 and t2 on two worker databases, and glue them together on a master database. The format of the address of a REMOTE TABLE is: mapi:monetdb://<host>:<port>/<dbname>, where all three parameters are compulsory. A worker database can reside on both a local cluster node and a remote machine. N.B.:

  1. The declaration of a REMOTE TABLE must match exactly the signature of its counterpart in the remote database, i.e., the same table name, the same columns names and the same column data types.
  2. Currently, at the creation time of a remote table, the remote database server is not contacted to verify the existence of the table. When a CREATE REMOTE TABLE report “operation successful”, it merely means that the information about the new remote table has been added to the local SQL catalogue. The check at the remote database server is delayed until the first actual query on the remote table.

    -- Start a server and client for worker-database1, and create partition table1
    $ mserver5 --dbpath=<path-to>/rt1 --set mapi_port=50001
    $ mclient –d rt1 –p 50001
    sql> CREATE TABLE t1 (i int);
    sql> INSERT INTO t1 VALUES (11), (13);

    -- Start a server and client for worker-database2, and create partition table2
    $ mserver5 --dbpath=<path-to>/rt2 --set mapi_port=50002
    $ mclient –d rt2 –p 50002
    sql> CREATE TABLE t2 (j int);
    sql> INSERT INTO t2 VALUES (23), (27);

    -- Start a server and client for the master-database,
    --   and create a MERGE TABLE containing two REMOTE TABLEs
    $ mserver5 --dbpath=<path-to>/mst
    $ mclient –d mst
    sql> CREATE MERGE TABLE mt1 (t int);
    sql> -- Identify t1, t2 as REMOTE TABLEs with their locations
    sql> CREATE REMOTE TABLE t1 (i int) on ‘mapi:monetdb://localhost:50001/rt1’;
    sql> CREATE REMOTE TABLE t2 (j int) on ‘mapi:monetdb://localhost:50002/rt2’;
    sql> -- Add the remote tables into the MERGE TABLE
    sql> ALTER TABLE mt1 ADD TABLE t1;
    sql> ALTER TABLE mt1 ADD TABLE t2;
    sql> -- Sanity check:
    sql>SELECT count(*) from t1;
    +------+
    | L1   |
    +======+
    |    2 |
    +------+
    1 tuple (9.389ms)
    sql>SELECT count(*) from t2;
    +------+
    | L1   |
    +======+
    |    2 |
    +------+
    1 tuple (8.520ms)
    sql>SELECT count(*) from mt1;
    +------+
    | L1   |
    +======+
    |    4 |
    +------+
    1 tuple (10.481ms)

Database Sampling

Database Sampling mk Sat, 10/08/2011 - 09:09

Sampling a database is an essential step to improve the response time for database exploration. In the context of the SciBORQ project, we have introduced a number of sampling techniques in the MonetDB software stack. Our goal is to provide methods for performing sampling over a) the result of a query, b) the base tables, and c) the entire database schema. Sampling can be performed during query execution, as well as during data loading in the case of predefined sampling indexes. Eventually, we want to support both uniform and weighted sampling, but in the existing released, all sampling methods are uniform. The sampling methods included in the distribution are described below.

Sampling operator
A new SQL operator SAMPLE has been added to support sampling the result of a query.

sample_column:
    SELECT ... FROM ... WHERE ... SAMPLE <expr>

If <expr> is a non-negative integer literal, it defines the number of rows to be included in the sample. If <expr> is a real literal between [ 0.0, 1.0 ]  it refers to the percentage of the result set to be sampled. For example,  if <expr> is 0.3, then the sample will contain 30% of the rows in the query result.

Sampling base tables or subquery results
Because SAMPLE primarily operates on query results, it is treated as the same type of operator as the LIMIT clauses, which according to the SQL:2003 standard, may only be used in the outer most SELECT clause. So, before the Jul2017 release, SAMPLE is not allowed in a subquery; in addition, the SAMPLE operator does not operates on query input data. However, both restrictions can be circumvented using a table producing function, for example

CREATE FUNCTION mysample ()
RETURNS TABLE(col a,...)
BEGIN
   RETURN
     SELECT a,...
     FROM name_table
     SAMPLE 100;
end;

Then one can use the function mysample() to create a sampled table, for instance:

INSERT INTO sample_table (SELECT * FROM mysample());

In this way, we can apply SAMPLE on base tables, before running the actual query.

Uniform sampling implementation
The current sampling methods all use uniform sampling, which is based on the algorithm A as described in the paper "Faster Methods for Random Sampling" by Jeffrey Scott Vitter [1]. Algorithm A is not the fastest one, but it only makes <expr> number of calls in function random() and it is simpler than the other more complex and CPU intensive algorithms in the literature. Instead of performing one random experiment for each row to decide if it should be included in the sample or not, Algorithm A skips <expr> rows and includes the next row found. The algorithm scans the input relation sequentially and maintains the uniqueness and sort properties. The sample is without replacement.

Database migration

Database migration mk Sun, 10/13/2013 - 14:19

The easiest method to move a MonetDB to a new location, or to create a copy for back-up purposes, runs as follows. First stop the database server by calling the monetdb management tool

bash> monetdb lock dbname

When the server is stopped, you can still connect as a database administrator to ensure that all (update) queries have ceased execution (see system monitor). Once satisfied that the activity has ceased, you can either follow the safe route to dump/restore a database, or perform a file-level RSYNC or RCP to copy the complete <dbfarm>/<dbname> directory to another dbfarm location. The MonetDB daemon process attached to the remote <dbfarm> will pick it up and makes it visible for connections. Finally, perform the following operation to continue processing user requests at the orginal server.

bash> monetdb release dbname

Loading Bulk Data

Loading Bulk Data sjoerd Fri, 03/02/2012 - 13:52

There are many ways in which you can insert data into an SQL table.  Using the best method for your application makes a difference between a slow and pleasantly fast experience. The pros and cons of the various methods supported by MonetDB are explained here.

Simple value insertion

The easiest way to insert data into a table defined over three integer columns is to use the INSERT INTO SQL query:

INSERT INTO table_name VALUES (1, 2, 3);

This method is well suited for low volume inserts. However, this is a seriously inefficient way of doing things in MonetDB when you intend to load thousands or even millions of tuples at the same time. MonetDB, by default, runs in auto-commit mode. This means that each SQL insert query is a full-fledged database transaction. This means that this query changes the database and necessarily causes a write to the disk to safeguard the change. This can become a serious bottleneck for hard disk drives. A typical 7,200 rpm SATA drive can do about 75 to 100 I/O operations per second (See Wikipedia). Since each INSERT is at least one I/O operation (depending on, among others, the underlying operating system), this limits throughput severely. Using a SSD instead improves the situation somewhat.

A much better way to do lots of inserts is to make sure the inserts are bundled into a single transaction. This can easily be done by surrounding the inserts with a START TRANSACTION and COMMIT, as in:

START TRANSACTION;
INSERT INTO table VALUES (1, 2, 3);
...
COMMIT;

An other way to accomplish the same result is to clear the auto-commit mode in the server. How this is done depends on how you connect to the server.  In ODBC this can be done with a call to the function SQLSetConnectAttr(), in JDBC it can be done with a call to the setAutoCommit()method. In the mclient tool, it can be done when using it interactively with the \acommand and by using the -a command line option.

Note that the SQL variant

INSERT INTO table VALUES (1, 2, 3), (3, 4, 5), (6, 7, 8);

is also supported by MonetDB. However, this variant is not to be recommended for large numbers of tuples either. It uses a lot of stack space on the server, and stack space is a very limited commodity. In other words, if you try to insert too many tuples in one go, the query will fail, or worse, the server will crash.

A major cost factor in all insertion statements is to decipher the call structure and  turning the literals into the internal format. The overhead of the former can be removed with a slightly faster way of inserting data using a prepared query. In ODBC you would first call:

SQLPrepare(hdl, "INSERT INTO table VALUES (?, ?, ?)", SQL_NTS);

Then you  need to call SQLBindParameter() for each of the parameters in the query, and finally call

SQLExecute(hdl);

for each insert.  As before, all insert queries should be done in a single transaction.

In JDBC a similar sequence using a PreparedStatement can be used. Also consider using the JDBC batching functionality.

Error handling

Unfortunately input files may contain errors. They are collected in a predefined table that contains a copy of the failed line and line number in the source file. The rejects table should be explicitly cleared before you load a new file.

SELECT * from sys.rejects;

CALL sys.clearrejects();

Parallel data insertion

The methods discussed so far share another disadvantage: even if the server has multiple CPUs/cores, the server will only use one at a time for this simple query.  A better way is to use the  COPY INTO statement.  COPY INTO is a single query that takes a complete file (or stream) of comma-separated values (CSV or user defined delimiters) and inserts the data in one go using system cores in parallel.  Since COPY INTO is a single query, whether or not auto-commit mode is enabled doesn't matter. What matters is the intended use of the bulk loaded data. If you plan to use the data only during a single session or in-memory processing then use a CREATE GLOBAL TEMPORARY TABLE ... ON COMMIT PRESERVE ROWS. It avoids the IO to securely store the data on disk for later use. For most cases, though, using a CREATE TABLE despite being slower upon first load, gives great benefits when you continue querying in the future.

COPY INTO comes in many variants.  The easiest form is:

COPY INTO table FROM 'file';

The 'file' must be an SQL string that contains the absolute path name of the file to be inserted. The file is read directly by the server, and so has to be accessible to the server's file-system.  By default, the field delimiter is '|', i.e. a vertical bar, and the record delimiter is '\n', i.e. a newline. The delimiters can be changed:

COPY INTO table from 'file' USING DELIMITERS '|','\n';

Often CSV-encoded files use quote characters to surround field values. By default, no quote characters are recognized, but by using the following form, they will be recognized:

COPY INTO table from 'file' USING DELIMITERS '|','\n','"';

When using quote characters, the other delimiters can be part of the value, and the quote character itself can be part of the value if it is preceded by a \ (backslash) or if it is doubled.  The backslash itself must also be doubled.

Often NULL values need to be inserted as well.  By default, a NULL value is indicated by using NULL in the CSV file, but often the actual value used is different, e.g. 'no value'.  This can be indicated by using:

COPY INTO table from 'file' USING DELIMITERS '|','\n','"' NULL AS '';

When the data is not directly accessible to the server but is accessible to the client, the 'file' in the above queries can be replaced by STDIN:

COPY INTO table FROM STDIN;

When using STDIN, the contents of the CSV file should follow the query.  This form is not directly usable in ODBC or JDBC, but see SQLcopyinto.java for an example of how to use this from Java.  If using the mclient tool and the CSV data is in a file by itself, you can use the command (also see the mclient manual):

mclient -d database  -s  "COPY  INTO  table  FROM  STDIN  USING  DELIMITERS ',','\\n','\"'" - < file

The COPY INTO forms mentioned so far share one important drawback.  When a lot of data is to be inserted, the server doesn't know how much memory to allocate for the tables, and so will likely allocate too little.  This means that during the insertion process, the server has to grow the allocated memory area.  This can be an expensive operation.  Therefore, it is better to give the server a count of how many records are to be inserted:

COPY n RECORDS INTO table FROM 'file';

Here n should be a number that is at least as large as the actual number of records to be inserted.  If the file contains more than n records, only n will be inserted, if the file contains fewer, all values will be inserted.  Giving a higher number is especially useful if multiple COPY INTO queries are to be done on the same table.  The first COPY INTO, when the table is still empty, should be the total count of to-be-inserted values so that the server will allocate enough memory when the tables are first created (they are only really created once data is inserted). This form is the fastest way of inserting data into MonetDB.

One more elaboration of the COPY INTO form is to specify an offset into the CSV file.  Often CSV files contain header information that is not part of the actual data.  This header information can be skipped by using the form

COPY n OFFSET m RECORDS INTO table FROM 'file';

Here n is the number of records to insert and m is the first record to be inserted where the first record in the file is called 1 (i.e. counting is 1 based).

For completeness, if the number of records is unknown but there is a header that needs to be skipped, the form is

COPY OFFSET m INTO table FROM 'file';

CSV Bulk Loads

CSV Bulk Loads zhang Fri, 09/01/2017 - 15:01

The COPY INTO command enables fast insertion of multiple tuples from a text file or the standard input. Each tuple in the input file is terminated by a record-separator (default '\n') and fields are separated by the field separator (default '|' ). The field values should adhere to the syntax for value literals. Alternative separators should adhere to the SQL lexical syntax for string values. A different NULL value representation can be specified using the NULL AS null_string option. Furthermore, the fields are optionally enclosed with a user defined quote character. The text file should use UTF-8 encoding if specified as file_name, and the same encoding as mclient is used if data are read FROM STDIN.

The input files must be accessible by the database server. Therefore, they must reside on or be accessible to the machine on which the database server is running. Also, the input files must be identified with an absolute path name. The STDIN file designator reads data streaming from the client application. The end of file (EOF) character signifies the end of the sequence.

The full syntax for the COPY INTO command is the following:

COPY [ int_val [ OFFSET int_val ] RECORDS ] INTO table_name
          [ '(' column_name ',' ... ')' ] FROM
          ( file_name ',' ... | STDIN ) [ '(' header [ STRING ] ',' ... ')' ]
          [ [USING] DELIMITERS  field_separator
            [',' record_separator [ ',' string_quote ]]]
          [ NULL [AS] null_string ] [ LOCKED ] [ BEST EFFORT ] [ NO CONSTRAINT ]

Note that the various features must be specified in the order shown above. For instance the query:

sql>COPY INTO tbl FROM STDIN BEST EFFORT NULL AS '';
syntax error, unexpected sql NULL, expecting SCOLON in: "copy into tbl from stdin best effort null"

is not accepted by the SQL parser because the BEST EFFORT feature is specified before the NULL AS feature.

COPY INTO from different input sources

COPY INTO is able to read data from different input sources, including standard input, a single CVS file, multiple CSV files and compressed CSV files.  For each example, we create the following table to start with:

sql>CREATE TABLE tbl (i INT, s STRING, d DECIMAL(5,2));
operation successful (3.719ms)

From standard input

The most straightforward way is to read data from the standard input STDIN:

sql>COPY INTO tbl FROM STDIN;
more>1|abc|0.1
more>2|def|0.2
more>3|ghi|0.3
more>^D
3 affected rows (116.271ms)

sql>SELECT * FROM tbl;
+------+------+---------+
| i    | s    | d       |
+======+======+=========+
|    1 | abc  |    0.10 |
|    2 | def  |    0.20 |
|    3 | ghi  |    0.30 |
+------+------+---------+
3 tuples (3.633ms)

From a CSV file

Assume data are stored in a file named /tmp/example1.csv:

$ cat /tmp/example1.csv
1|dfs|3.1
2|ewe|3.23
3|erw|2.43

We can load the data using the following COPY INTO query:

sql>COPY INTO tbl FROM '/tmp/example1.csv';
3 affected rows (47.725ms)
sql>SELECT * FROM tbl;
+------+------+---------+
| i    | s    | d       |
+======+======+=========+
|    1 | dfs  |    3.10 |
|    2 | ewe  |    3.23 |
|    3 | erw  |    2.43 |
+------+------+---------+
3 tuples (3.287ms)

From multiple files

If the data are distributed across multiple files and have the same schema, one can load them all at once. Assume we have the following example files /tmp/example2.1.csv and /tmp/example2.2.csv:

$ cat /tmp/example2.1.csv
1|dfs|3.1
2|ewe|3.23
3|erw|2.43
$ cat /tmp/example2.2.csv
4|dfs|3.2
5|ewe|3.3
6|erw|2.3

we can use them in the COPY INTO command as follows:

sql>COPY INTO tbl FROM '/tmp/example2.1.csv', '/tmp/example2.2.csv';
6 affected rows (152.470ms)
sql>SELECT * FROM tbl;
+------+------+---------+
| i    | s    | d       |
+======+======+=========+
|    1 | dfs  |    3.10 |
|    2 | ewe  |    3.23 |
|    3 | erw  |    2.43 |
|    4 | dfs  |    3.20 |
|    5 | ewe  |    3.30 |
|    6 | erw  |    2.30 |
+------+------+---------+
6 tuples (2.362ms)

From compressed files

One can pass compressed files to COPY INTO to be loaded. Supported compressions include bzip2, gzip, lzma and snappy.

First, let us compress our example CSV file:

$ bzip2 -k /tmp/example1.csv
$ gzip -k /tmp/example1.csv
$ xz  -k /tmp/example1.csv
$ ls /tmp/example1.csv.*
/tmp/example1.csv.bz2    /tmp/example1.csv.gz /tmp/example1.csv.xz

Now, we can load both files in one COPY INTO statement:

sql>COPY INTO tbl FROM '/tmp/example1.csv.bz2', '/tmp/example1.csv.gz', '/tmp/example1.csv.xz';
9 affected rows (93.635ms)
sql>SELECT * FROM tbl;
+------+------+---------+
| i    | s    | d       |
+======+======+=========+
|    1 | dfs  |    3.10 |
|    2 | ewe  |    3.23 |
|    3 | erw  |    2.43 |
|    1 | dfs  |    3.10 |
|    2 | ewe  |    3.23 |
|    3 | erw  |    2.43 |
|    1 | dfs  |    3.10 |
|    2 | ewe  |    3.23 |
|    3 | erw  |    2.43 |
+------+------+---------+
9 tuples (1.903ms)

Specifying the number of records

It is strongly recommended to specify the maximum number of records to be inserted. This allows the database server to allocate enough space for the table in advance, so that it can avoid guessing and subsequent extending of table space, which may involve potentially expensive copying. If the exact number is unknown, a (slight) overestimation is generally better for performance than an underestimation.

If the number of actual records in the input source is smaller than the number of records specified, COPY INTO will simply stop at the end of input.

If the number of actual records in the input source is larger than the number of records specified, COPY INTO will only load the number of specified records.

The example below shows how to specify the number of records (the earlier example CSV file is used). First we give an overestimation, then we give an underestimation:

sql>COPY 4 RECORDS INTO tbl FROM '/tmp/example1.csv';
3 affected rows (46.254ms)
sql>SELECT * FROM tbl;
+------+------+---------+
| i    | s    | d       |
+======+======+=========+
|    1 | dfs  |    3.10 |
|    2 | ewe  |    3.23 |
|    3 | erw  |    2.43 |
+------+------+---------+
3 tuples (3.408ms)
sql>DELETE FROM tbl;
3 affected rows (2.488ms)
sql>COPY 2 RECORDS INTO tbl FROM '/tmp/example1.csv';
2 affected rows (44.747ms)
sql>SELECT * FROM tbl;
+------+------+---------+
| i    | s    | d       |
+======+======+=========+
|    1 | dfs  |    3.10 |
|    2 | ewe  |    3.23 |
+------+------+---------+
2 tuples (4.205ms)

Specifying an offset

A portion of the input file can be skipped using the OFFSET feature. Specifying an offset n directs COPY INTO to start reading at record n. The offset of the first record is 1. More formally, the offset directive counts n unquoted record separators. Here we use a file named /tmp/example2.csv:

$ cat /tmp/example2.csv
i|s|d
1|dfs|3.1
2|ewe|3.23
3|erw|2.43

Since the first line of the file does not have the same schema as the table, it must be ignored. So, we specify that the COPY INTO should start at the second record (i.e. 'OFFSET 2'):

sql>COPY 3 OFFSET 2 RECORDS INTO tbl FROM '/tmp/example2.csv';
3 affected rows (46.947ms)
sql>SELECT * from tbl;
+------+------+---------+
| i    | s    | d       |
+======+======+=========+
|    1 | dfs  |    3.10 |
|    2 | ewe  |    3.23 |
|    3 | erw  |    2.43 |
+------+------+---------+
3 tuples (3.530ms)

Specifying columns to load and their order

By default, a CSV file must contain data for all columns specified in a table, and in the same order as when the table was created. Additional syntax is supported to load data from CSV files that do not satisfy these requirements. Here we show with several examples how this feature works. 

Scenario 1: skip column in the input file

Assume we have the following input file:

$cat /tmp/cars.csv
Year|Make|Model
1997|Ford|E350
2000|Mercury|Cougar

However, when loading this file, we want to skip the second column containing the "Make" information:

sql>CREATE TABLE cars (y STRING, make STRING, model STRING);
operation successful (4.148ms)
sql>COPY 2 OFFSET 2 RECORDS INTO cars(y, model) FROM '/tmp/cars.csv';
2 affected rows (52.367ms)
sql>SELECT * FROM cars;
+------+------+--------+
| y    | make | model  |
+======+======+========+
| 1997 | null | E350   |
| 2000 | null | Cougar | 
+------+------+--------+
2 tuples (2.141ms)

Scenario 2: missing column and out of order

In the following CSV file, data for the column "Model" is missing, while the data for the columns "Year" and "Make" are in the reverse order as specified when the table "cars" was created:

$ cat /tmp/cars2.csv
Make|Year
Ford|1997
Mercury|2000

The following example shows how we can tell COPY INTO which columns are contained in the input CSV file and in which order:

sql>COPY 2 OFFSET 2 RECORDS INTO cars(y,make) from '/tmp/cars2.csv'(make,y);
2 affected rows (33.402ms)
sql>select * from cars;
+------+---------+-------+
| y    | make    | model |
+======+=========+=======+
| 1997 | Ford    | null  |
| 2000 | Mercury | null  | 
+------+------+----------+
2 tuples (1.968ms)

Customise delimiters

The COPY INTO command is versatile enough to accept records in many formats. The basic idea is that each field of the record is separated from other records using a specified character string, and each record is separated from other records using another character string. Moreover since string values can contain either of those characters strings, they might need to be quoted, so as to be read correctly.

The user can specify the format of the file with

[[USING ] DELIMITERS field_separator [, record_separator [, string_quote ]]]

syntax. The input syntax should then comply to the following grammar:

[[[quote] [[escape]char]* [quote]] feldspar]* record separator

Quote characters in quoted fields may be escaped with a backslash. Field and record separators can be embedded in quoted fields.

By default, fields are separated by the pipe character ('|'), records by the new line ('\n') character.

The queries below show how to i) change the default field separator with a single character; ii) change the default field separator with a string of two characters; and iii) change the default string quote:

sql>COPY INTO tbl FROM STDIN DELIMITERS ',';
more>1,abc,0.1
more>2,def,0.2
more>2 affected rows (41.766ms)

sql>COPY INTO tbl FROM STDIN DELIMITERS ';;';
more>3;;ghi;;0.03
more>4;;jkl;;0.04
more>2 affected rows (39.128ms)

sql>COPY INTO tbl FROM STDIN DELIMITERS ';;','\n','|';
more>5;;|klm|;;0.05
more>6;;|opq|;;0.06
more>2 affected rows (37.899ms)

sql>SELECT * FROM tbl;
+------+------+---------+
| i    | s    | d       |
+======+======+=========+
|    1 | abc  |    0.10 |
|    2 | def  |    0.20 |
|    3 | ghi  |    0.03 |
|    4 | jkl  |    0.04 |
|    5 | klm  |    0.05 |
|    6 | opq  |    0.06 |
+------+------+---------+
6 tuples (3.281ms)

Customise representation of the NULL value

By default the character sequence "NULL" (case insensitive) represent the NULL value:

sql>CREATE TABLE str (s STRING);
operation successful (2.712ms)
sql>COPY INTO str FROM STDIN;
more>NULL
more>null
more>Null
more>nUll
more>4 affected rows (50.040ms)

sql>SELECT * FROM str;
+------+
| s    |
+======+
| null |
| null |
| null |
| null |
+------+
4 tuples (4.095ms)
sql>SELECT * FROM str WHERE s = 'null';
+---+
| s |
+===+
+---+
0 tuples (1.817ms)
sql>SELECT * FROM str WHERE s IS NULL;
+------+
| s    |
+======+
| null |
| null |
| null |
| null |
+------+
4 tuples (2.491ms)

The NULL AS feature allows a user to specify an alternative string as the representation of the NULL value. Whenever the alternative string is read during the loading process, the NULL value is inserted into the table. Any valid string can be specified as the representation of the NULL value.

In the example below, the first COPY INTO treats an empty string ('') as the NULL value, while the second COPY INTO uses a long text to represent the NULL value, which typically can be used to prevent any text in the input to be treated as the NULL value. The keyword AS is optional.

sql>CREATE TABLE nullas (i INT, s STRING);
operation successful (9.010ms)
sql>COPY INTO nullas FROM STDIN NULL AS '';
more>1|
more>2|null
more>3|"NuLL"
more>^D
3 affected rows (40.528ms)
sql>COPY INTO nullas FROM STDIN NULL 'this-string-does-not-exist-in-the-input';
more>4|
more>5|null
more>6|NULL
more>^D
3 affected rows (43.968ms)

sql>SELECT * FROM nullas;
+------+--------+
| i    | s      |
+======+========+
|    1 | null   |
|    2 | null   |
|    3 | "NuLL" |
|    4 |        |
|    5 | null   |
|    6 | NULL   |
+------+--------+
3 tuples (1.999ms)
sql>SELECT * FROM nullas WHERE s IS NULL;
+------+------+
| i    | s    |
+======+======+
|    1 | null |
+------+------+
1 tuple (4.391ms)
sql>SELECT * FROM nullas WHERE s = 'null';
+------+------+
| i    | s    |
+======+======+
|    2 | null |
|    5 | null |
+------+------+
1 tuple (0.969ms)
sql>SELECT * FROM nullas WHERE s = '"NuLL"';
+------+--------+
| i    | s      |
+======+========+
|    3 | "NuLL" |
+------+--------+
1 tuple (3.484ms)
sql>SELECT * FROM nullas WHERE s = 'NULL';
+------+------+
| i    | s    |
+======+======+
|    6 | NULL |
+------+------+
1 tuple (3.484ms)

Ignore errors in the input with the BEST EFFORT mode

Normally if there is an error in the input, such as a missing field, COPY INTO will fail. Using the BEST EFFORT mode, COPY INTO continues parsing the input, and writes all the records that failed in the sys.rejects table.

sql>CREATE TABLE cars (y STRING, make STRING, model STRING);
operation successful (0.734ms)
sql>COPY INTO cars FROM STDIN DELIMITERS ',','\n' BEST EFFORT;
more>2000,
more>2004,Honda,Pilot
more>2001,Honda,Civic
more>,,,
more>3 affected rows (28.604ms)
sql>SELECT * FROM cars;
+------+-------+-------+
| y    | make  | model |
+======+=======+=======+
| 2004 | Honda | Pilot |
| 2001 | Honda | Civic |
|      |       |       |
+------+-------+-------+
3 tuples (2.233ms)

Above example shows that only the valid records are retained. In this scenario the first record is omitted.

sql>SELECT * FROM rejects;
+-------+-------+------------------------+--------+
| rowid | fldid | message                | input  |
+=======+=======+========================+========+
|     0 |     1 | Column value 2 missing | 2000,  |
+-------+-------+------------------------+--------+

Tips to boost performance

Here are several tips to possibly further improve the performance of bulk data loading:

  • Add constraints after data loading: if one needs to load multiple chunks of data, it is advisable to add the integrity constraints to the table after all files of this bunch have been loaded. The ALTER statements perform bulk integrity checking, which is often more efficiently than conducting the check after each small amount of data are loaded. When adding more data to an existing table, one can temporarily remove existing constraints on the table, load the new data and finally re-add the constraints.
  • LOCKED mode for single user: in many bulk loading situations, the original data file can be saved as a backup or recreated for disaster handling. This relieves the database system from having to prepare for recovery and to save significant storage space. The LOCKED qualifier can be used in this situation to skip the logging operation that is done by default. (WARNING: this feature must be used with great care. When using the LOCKED mode, it is the responsibility of the database user to make sure that there is currently only a single client connection accessing the database. The database server will not check this.)

Binary Bulk Loads

Binary Bulk Loads mk Sun, 03/28/2010 - 23:29

A database is often populated with the SQL COPY statement using an ASCII representation of a table with column and record separators, e.g. a CSV file. This scheme is preferred for almost all situations. Its performance is mostly determined by parsing cost of turning ASCII value representation into the native binary one. The MonetDB copy process is highly optimized for multi-core systems, where several threads in parallel handle variable length records efficiently.

When large tables are migrated between MonetDB instances, or when large tables are produced by a trusted external program, a slightly faster loading track can be used instead. This saves rendering of data into ASCII and subsequent parsing of the data being exchanged. For this to work, the user creates the binary version of the BAT storage and 'attach' it to the SQL catalog using a variation of the COPY command.

To illustrate, consider the following snippet:

create table Tmp( i integer, f real, s string); 
copy binary into Tmp from ('path_to_file_i', 'path_to_file_f', 'path_to_file_s');

Each attachment file is produced by a program that writes the binary image of the BAT directly, i.e. a binary dump of an C-array. For the SQL types char (1 byte), tinyint (8-bits), smallint (16-bits),  int/integer (32 bits), and bigint (64 bits) correspond with the C types directly; real and double are mapped onto type float and double in C, respectively. For variable length strings, the file must have one C-based string value per line, terminated by a newline, and it is processed without escape character conversion. Fixed length strings are handled the same way. MonetDB assumes that all files are aligned, i.e. the i-th value in each file corresponds to the i-th record in the table.

All other types (including UTF8 and escaped characters) should be handled using the default COPY INTO language construct.

The file names must be absolute paths to the column input files. They must reside on the same file system as the database farm and they will replace the content of the table Tmp. The files with numeric data are moved into place to avoid copying. String input files are retained in their original location and should be removed by the user.

WARNING: currently, upon discovering an error in the input, the files are 'gone'.

Apache Pig

Apache Pig mk Fri, 07/19/2013 - 10:31

In many cases, the data that is supposed to be stored in MonetDB is created by a large-scale data aggregation process. Apache Pig is a popular tool for this task. There is a possibility of directly generating MonetDB binary column files using the MonetDB-Pig-Layer.

Inside a Pig script, this is used as follows:

STORE raw INTO './results/' USING nl.cwi.da.monetdb.loader.hadoop.MonetDBStoreFunc;
 
This will generate a SQL schema file, a set of column files, and a load command. Keep in mind that currently, only primitive Java types such as Integer, Long, Float, Double, String etc. are supported.

Exporting Bulk Data

Exporting Bulk Data zhang Fri, 09/01/2017 - 16:11

The COPY INTO command with a file name argument allows for fast dumping of a result set into an ASCII file. The file must be accessible by the server and a full path name may be required. The file STDOUT can be used to direct the result to the primary output channel.

The delimiters and NULL AS arguments provide control over the layout required.

COPY subquery INTO file_name | STDOUT
            [ [USING] DELIMITERS field_separator
            [',' record_separator [ ',' string_quote ]]]
            [ NULL AS null_string ]

For the output 'file_name', one can specify a file name ending with 'gz', 'bz2' or 'xz' to use the appropriate compression library (if available).

sql>SELECT * FROM cars;
+------+-------+--------+
| y    | make  | model  |
+======+=======+========+
| 2000 | Ford  | Focus  |
| 2001 | Honda | Accord |
| 2004 | Honda | Pilot  |
+------+-------+--------+
3 tuples (1.527ms)
sql>COPY SELECT * FROM cars INTO '/tmp/cars_out.csv' DELIMITERS ',','\n';
3 affected rows (2.646ms)
sql>COPY SELECT * FROM cars INTO '/tmp/cars_out.gz' DELIMITERS ',','\n';
3 affected rows (3.723ms)
sql>COPY SELECT * FROM cars INTO '/tmp/cars_out.bz2' DELIMITERS ',','\n';
3 affected rows (5.729ms)
sql>COPY SELECT * FROM cars INTO '/tmp/cars_out.xz' DELIMITERS ',','\n';
3 affected rows (4.037ms)

Let us check some of the results:

$ cat /tmp/cars.csv
"2000","Ford","Focus"
"2001","Honda","Accord"
"2004","Honda","Pilot"
$ bzip2 -d cars_out.bz2 ; cat cars_out
"2000","Ford","Focus"
"2001","Honda","Accord"
"2004","Honda","Pilot"

Optimizer Pipelines

Optimizer Pipelines mk Sat, 06/05/2010 - 08:20

The code produced by MonetDB/SQL is massaged by an optimizer pipeline.  The pipeline to be used is identified by the SQL global variable optimizer, which can be modified using a SQL assignment, e.g. the minimal optimizer pipeline is set using:

set optimizer='minimal_pipe';
sql>select optimizer;
+--------------+
| single value |
+==============+
| minimal_pipe |
+--------------+

Each pipeline consists of a sequence of MAL function calls that inspect and transform the plan for better execution. The preferred optimizer can be changed by either providing its name or the complete MAL function sequence.  For example the minimal pipe can be set also using:

sql> set optimizer='optimizer.inline();optimizer.remap();optimizer.deadcode();optimizer.multiplex();optimizer.garbageCollector();';

The final result of the optimizer steps becomes visible using the SQL EXPLAIN statement modifier. Alternatively, the SQL DEBUG statement modifier in combination with the 'o' command provides access to the intermediate optimizer results.

Several optimizer pipelines are pre-defined in the MonetDB sources. They represent ongoing development activities and experimental code. Their scope of applicability and/or stability  has not reached a satisfactory level to include it in the default pipeline. The predefined optimizer pipelines can be inspected as follows:

sql>select * from optimizers();

A user-defined optimizer pipeline is checked against the dependency information maintained in the optimizer library to ensure there are no conflicts and at least the pre-requisite optimizers are used.

The default SQL optimization pipe line contains the following steps:

inline Inline functions identified as such.
remap Locate hardwired multiplex operations.
costModel Inspects the SQL catalog for size information.
coercions Performs static type coercions.
evaluate Evaluate constant expressions.
emptybind Removes empty bindings
pushselect Push selections through e.g. joins.
aliases Remove alias assignments.
mitosis Horizontal fragmentation.
mergetable Expand horizontal fragmented plans.
deadcode Remove all code not leading to used results.
aliases Remove alias assignments.
constants Evaluate constant expressions.
commonterms Remove duplicate expressions.
projectionpath Searchs multiple joins and glues them together for better runtime optimization.
deadcode Remove all code not leading to used results.
reorder Reorder the execution order.
matpack Recombine partitions into a single BAT.
dataflow item Prepare code for multi-core execution
querylog Keep track of SQL query definitions and invocations.
multiplex Expand all remaining multiplex operations to iterators.
generator Implement for-loop generator as table producing function.
profiler Highlight the candidate lists in the code.
candidates Highlight the candidate lists in the code.
garbageCollector Injects calls to the garbage collector to free up space.

 

Query timing

Query timing mk Sun, 10/13/2013 - 14:28

Timing a query execution is supported in multiple ways, but largely depends on what you want to measure. Point-to-point wall clock time, or the actual behavior of kernel operations.

(1) The baseline is to use simple command line tools, such at TIME on Linux to assess the performance of running a script against mclient. Beware that /bin/time and /usr/bin/time are not the same, they mainly measure and report the wall-clock time spent by the given command/process. See their respective man pages for details.

(2) The next approach is to use the "--interactive" option of the mclient tool, which will report on the timing of each individual SQL query in a script in easy human consumable terms. It returns the wall-clock time between sending the query to the server and receiving the first block of answers. Its rendering can be controlled (see mclient).

(3) The query history can also be maintained in a separate log for post analysis. (see description)

Thus, (1) includes everything from loading the mclient binary and starting the mclient process, parsing the query in mclient, sending to the server, having the server execute the query and serialize the result, sending the result back to the client, to the client receiving, parsing and rendering the result, and sending the result to /dev/null ("for free"), to a file (I/O), or to a terminal (scrolling). (2) merely includes the time the server spends on receiving and executing the query and creating the result. The abovementioned costs on the client side to receive, parse, render, etc. the result are excluded. The same holds for (3)

A detailed time of an SQL query can be obtained with prepending the query with the modifier TRACE. It will produce a queryable table with a break down of all relational algebra operations (see TRACE command). The profiling tools stethoscope and tomograph provide further details for those interested in the inner working of the system. It provides a hook to many system parameters, e.g. input/output, CPU cycles, and threads' activities.

Timing a database query should be done with care. Often you will notice differences in response time for the same query ran multiple times. The underlying cause can be that the data itself resides on disk (slow) or is already avaiable in the memory caches (fast), a single user runs queries (fast) or has to compete with other users (slow), including competing with other processes on your box fighting over cpu, memory, and IO resources. As a precaution you might want to flush the system caches. The Windows tool flushes the cache. You'll need to press the "Flush Cache WS" and "Flush All Standby" buttons. On Linux you have to create a little job that consumes all memory.

For more general information on running experiments and measuring time, see our performance tutorial.

Query History

Query History mk Sat, 03/09/2013 - 17:36

The SQL implementation comes with a simple query profiler to detect expensive queries. It is centered around two predefined internal tables that store the definitions of all executed SQL queries definitions and their execution time.

Query logging can be started by calling the procedure querylog_enable(), which saves some major compilation information of a query in the 'querylog_catalog' table:

table sys.querylog_catalog(
   id oid,
   owner string,
   defined timestamp,
   query string,
   pipe string,
    -- Query pipeline
   "plan" string,  -- Name of MAL plan
   mal integer,    -- size of MAL plan in number of statements
   optimize bigint -- time in microseconds for optimizer pipeline
)

Query logging can be stoped by calling procedure querylog_disable().

The query performance is stored in the table  'querylog_calls'. The owner of the query definition is also the one that will be referenced implicitly from the call events. The key timing attributes are 'run', i.e. the time to prepare  the result set , and 'ship', i.e. the time to render the result set  and sent it to the client. All timing in microseconds.

The remaining parameters illustrate the resource claims. The 'tuples' attribute denotes the size of the result set in number of rows. The 'space' depicts the total  size of all temporary columns created during query execution. Note, the space is allocated and freed during the query execution, leading to a much less demanding actual memory footprint. The 'cpu' load  is derived from the operating system system statistics (Linux only) and is given as a percentage. The same holds for the io waiting time.

table sys.querylog_calls (
    id oid, -- references query plan
    "start" timestamp,-- time the statement was started
    "stop" timestamp,-- time the statement was completely finished
    arguments string, -- actual call structure
    tuples bigint, -- number of tuples in the result set
    run bigint, -- time spent (in usec) until the result export
    ship bigint, -- time spent (in usec) to ship the result set
    cpu int, -- average cpu load percentage during execution
    io int -- percentage time waiting for IO to finish
)

create view sys.querylog_history as
    select qd.*, ql."start",ql."stop", ql.arguments, ql.tuples, ql.run, ql.ship, ql.cpu, ql.space, ql.io
    from sys.querylog_catalog() qd, sys.querylog_calls() ql
    where qd.id = ql.id and qd.owner = user;

The following code snippet illustrates its use.

sql>call sys.querylog_enable();
sql>select 1;
sql>select 1;
sql>select * from sys.querylog_catalog;
sql>select * from sys.querylog_calls;
sql>select * from sys.querylog_history;
sql>select id, query, avg(run) from sys.querylog_history group by id,query;
sql>call sys.querylog_disable();
sql>call sys.querylog_empty();

 

Storage Model

Storage Model mk Fri, 10/26/2012 - 13:00

The storage footprint for any given database schema can be obtained by inspecting the table producing function storage().  To illustrate below we see the storage characteristics of the lineitem table in TPCH SF-10. The column width for variable length strings, e.g. l_shipinstruct and l_comment, represents the average length found using a sample over the underlying column storage.

To estimate the storage footprint for a variation of this scheme, we first construct a relation with the model input. Thereafter it can be updated to reflect the expected database size and varwidth properties. The footprint then becomes available as table producing function storagemodel(). Also be aware that dictionary encoding may have taken place, which leads to a much less storage footprint (see l_shipinstruct). Dictionary encoding is currently a runtime feature,  it depends on insertion sequence and the dictionary size. It may lead to an overshoot in the estimated size. Therefore, the size of varchar columns should be taken with a grain of salt.

sql>call storagemodelinit();
sql>update storagemodelinput set  count = 1000000 where "table"='lineitem';
sql>update storagemodelinput set  "distinct" = 1000 where "table"='lineitem' and "type"='varchar';
sql>update storagemodelinput set  "distinct" = 330000 where "table"='lineitem' and "column"='l_comment';
sql>select * from storagemodel() where "table" = 'lineitem';

 

System monitor

System monitor mk Sun, 03/17/2013 - 20:39

The MonetDB kernel maintains an active queue of all running queries. This queue is available for all users to inspect the status of his own queries. The system administrator can inspect it to overlook the complete workload on the system. The queue is made visible as a table producing function, called sys.queue().

sql>select * from sys.queue();
+------+---------+----------------------------+----------+----------+---------+------------+----------------------------+
| qtag | user    | started                    | estimate | progress | status  | tag        | query                      |
+======+=========+============================+==========+==========+=========+============+============================+
|  902 | monetdb | 2013-03-30 10:23:39.000000 | null     |     null | running | 15168688@0 | select * from sys.queue(); |
+------+---------+----------------------------+----------+----------+---------+------------+----------------------------+
1 tuple (0.446ms)

The schema structure is largely self-explanatory. If the query (template) is ran multiple times, then the system can derive a progress indicator and calculate an estimated time of completion. The 'tag' column references the query log tables, provided this facility has been turned on. The initial column 'qtag' provides a key to each active query .

One can use the SQL functions PAUSE, RESUME and STOP to control the execution of (long) running queries.

NOTE that PAUSE and STOP takes effect at the first safe point within the query plan, which often is after the current MAL instruction has been finished. For complex queries over large database this may take seconds up to minutes.

sql> select * from sys.queue;
+------+---------+----------------------------+----------+----------+---------+-----------+--------------------------+
| qtag | user    | started                    | estimate | progress | status  | tag       | query                    |
+======+=========+============================+==========+==========+=========+===========+==========================+
|   52 | monetdb | 2013-12-23 12:34:50.000000 | null     |     null | running | 1349763@0 | select * from sys.queue; |
+------+---------+----------------------------+----------+----------+---------+-----------+--------------------------+
1 tuple (2.242ms)

--- start a long running query in another window
sql>select * from sys.queue;
+------+---------+----------------------------+----------+------+---------+-----------+-------------------------------------------------------------+
| qtag | user    | started                    | estimate | prog | status  | tag       | query                                                       |
:      :         :                            :          : ress :         :           :                                                             :
+======+=========+============================+==========+======+=========+===========+=============================================================+
|   56 | monetdb | 2013-12-23 12:35:03.000000 | null     | null | running | 1353354@0 | select                                                      |
:      :         :                            :          :      :         :           :  sum(l_extendedprice * l_discount) as revenue               :
:      :         :                            :          :      :         :           : from                                                        :
:      :         :                            :          :      :         :           :  lineitem                                                   :
:      :         :                            :          :      :         :           : where                                                       :
:      :         :                            :          :      :         :           :  l_shipdate >= date \'1994-01-01\'                          :
:      :         :                            :          :      :         :           :  and l_shipdate < date \'1994-01-01\' + interval \'1\' year :
:      :         :                            :          :      :         :           :  and l_discount between 0.06 - 0.01 and 0.06 + 0.01         :
:      :         :                            :          :      :         :           :  and l_quantity < 24;                                       :
|   57 | monetdb | 2013-12-23 12:35:10.000000 | null     | null | running | 1349763@0 | select * from sys.queue;                                    |
+------+---------+----------------------------+----------+------+---------+-----------+-------------------------------------------------------------+
2 tuples (1.404ms)
sql>call sys.pause(56);
sql>select * from sys.queue; 
------+---------+----------------------------+-----------+------+---------+-----------+-------------------------------------------------------------+
| qtag | user    | started                    | estimate | prog | status  | tag       | query                                                       |
:      :         :                            :          : ress :         :           :                                                             :
+======+=========+============================+==========+======+=========+===========+=============================================================+
|   56 | monetdb | 2013-12-23 12:35:03.000000 | null     | null | paused  | 1353354@0 | select                                                      |
:      :         :                            :          :      :         :           :  sum(l_extendedprice * l_discount) as revenue               :
:      :         :                            :          :      :         :           : from                                                        :
:      :         :                            :          :      :         :           :  lineitem                                                   :
:      :         :                            :          :      :         :           : where                                                       :
:      :         :                            :          :      :         :           :  l_shipdate >= date \'1994-01-01\'                          :
:      :         :                            :          :      :         :           :  and l_shipdate < date \'1994-01-01\' + interval \'1\' year :
:      :         :                            :          :      :         :           :  and l_discount between 0.06 - 0.01 and 0.06 + 0.01         :
:      :         :                            :          :      :         :           :  and l_quantity < 24;                                       :
|   65 | monetdb | 2013-12-23 12:36:03.000000 | null     | null | running | 1349763@0 | select * from sys.queue;                                    |
+------+---------+----------------------------+----------+------+---------+-----------+-------------------------------------------------------------+
2 tuples (3.082ms)
sql>call sys.resume(56);
sql>select * from sys.queue; 
------+---------+----------------------------+-----------+------+---------+-----------+-------------------------------------------------------------+
| qtag | user    | started                    | estimate | prog | status  | tag       | query                                                       |
:      :         :                            :          : ress :         :           :                                                             :
+======+=========+============================+==========+======+=========+===========+=============================================================+
|   56 | monetdb | 2013-12-23 12:35:03.000000 | null     | null | running | 1353354@0 | select                                                      |
:      :         :                            :          :      :         :           :  sum(l_extendedprice * l_discount) as revenue               :
:      :         :                            :          :      :         :           : from                                                        :
:      :         :                            :          :      :         :           :  lineitem                                                   :
:      :         :                            :          :      :         :           : where                                                       :
:      :         :                            :          :      :         :           :  l_shipdate >= date \'1994-01-01\'                          :
:      :         :                            :          :      :         :           :  and l_shipdate < date \'1994-01-01\' + interval \'1\' year :
:      :         :                            :          :      :         :           :  and l_discount between 0.06 - 0.01 and 0.06 + 0.01         :
:      :         :                            :          :      :         :           :  and l_quantity < 24;                                       :
|   65 | monetdb | 2013-12-23 12:36:03.000000 | null     | null | running | 1349763@0 | select * from sys.queue;                                    |
+------+---------+----------------------------+----------+------+---------+-----------+-------------------------------------------------------------+
2 tuples (12.320ms)
sql>call sys.stop(56);
sql>select * from sys.queue;
+------+---------+----------------------------+----------+----------+---------+-----------+--------------------------+
| qtag | user    | started                    | estimate | progress | status  | tag       | query                    |
+======+=========+============================+==========+==========+=========+===========+==========================+
|   67 | monetdb | 2013-12-23 12:36:22.000000 | null     |     null | running | 1349763@0 | select * from sys.queue; |
+------+---------+----------------------------+----------+----------+---------+-----------+--------------------------+
1 tuple (2.799ms)

Table statistics

Table statistics mk Tue, 09/16/2014 - 14:53

Statistics gathered over tables in the database can be found in the system table sys.statistics. This table is initially empty and explicitly filled or updated using the ANALYZE command:

ANALYZE schemaname [ '.' tablename [ '('columnname ...')' ] ] [SAMPLE size ] [MINMAX]

You can gather statistics for a) all tables in a schema or b) all columns of a specific table or c) a list of specific columns of one table. You can only gather statistics of tables with real physical column data, so not for views.

Since statistics gathering involves accessing and profiling all table columns data, it can take considerable time, especially if the tables are large or you analyze all tables in a schema. You may consider to add an optional MINMAX which directs exclusion of the expensive unique count operation. Likewise, a SAMPLE size can be used to provide a quick, but imprecise impression.

You can remove statistics data via SQL command: DELETE FROM sys.statistics. For example removing statistics for table: my_schema.my_table:

DELETE FROM sys.statistics WHERE column_id IN (SELECT c.id FROM sys.columns c WHERE c.table_id IN (SELECT t.id FROM sys.tables t WHERE t.name = 'my_table' AND t.schema_id IN (SELECT s.id FROM sys.schemas s WHERE s.name = 'my_schema')));

 

Date and Time functionality

Date and Time functionality zhang Tue, 09/05/2017 - 12:38

MonetDB offers many functionalities for storing and working with date and time values.

For the example here we have extended the voyages table in our VOC data set with randomly filled time columns, and also created timestamp columns by appending the random times to the already existing dates. The following columns from the voyages table are used:

Column name Data type In original data set Description
departure_date date yes The data at which the voyage started
departure_time time no The time at which the voyage started
departure_timestamp timestamp no Combination of departure_date and departure_time
arrival_date date yes The date at which the voyage ended
arrival_time time no The time at which the voyage ended
arrival_timestamp timestamp no Combination of arrival_date and arrival_time

The additional columns are created and populated with the following queries:

ALTER TABLE voyages ADD COLUMN departure_time TIME;
ALTER TABLE voyages ADD COLUMN departure_timestamp TIMESTAMP;
ALTER TABLE voyages ADD COLUMN arrival_time TIME;
ALTER TABLE voyages ADD COLUMN arrival_timestamp TIMESTAMP;

UPDATE voyages SET departure_time = STR_TO_TIME(RAND(), '%s');
UPDATE voyages SET arrival_time   = STR_TO_TIME(RAND(), '%s');

UPDATE voyages SET departure_timestamp = str_to_timestamp(date_to_str(departure_date, '%Y-%m-%d') || ' ' || time_to_str(departure_time, '%H:%M'), '%Y-%m-%d %H:%M');
UPDATE voyages SET arrival_timestamp   = str_to_timestamp(date_to_str(arrival_date, '%Y-%m-%d')   || ' ' || time_to_str(arrival_time, '%H:%M'), '%Y-%m-%d %H:%M');

Temporal types

In order to store and work with date and time values, MonetDB defines the following temporal data types:

  • DATE: calendar date in the Gregorian calendar, e.g. 1999-Jan-31.
  • TIME: time of day with a certain (configurable) precision and time zone.
  • TIMESTAMP: date and time data type combined.
  • INTERVAL: temporal interval, e.g. 1023.5 seconds or 31 months.

Using the current date or time

There are some SQL constants that return date and time values derived from the current date and time:

Query Example result With time zone Result type
SELECT CURRENT_DATE; 2015-05-22 no date
SELECT CURRENT_TIME; 14:18:17.000000+02:00 yes time
SELECT CURRENT_TIMESTAMP; 2015-05-22 14:18:17.000000+02:00 yes timestamp
SELECT NOW; 2015-05-22 14:18:17.000000+02:00 yes timestamp
SELECT LOCALTIME; 14:18:17.000000 no time
SELECT LOCALTIMESTAMP; 2015-05-22 14:18:17.000000 no timestamp

These functions are very useful for many use cases, for example, when we need to store the time a tuple was inserted into MonetDB. In that case we can create a table like this:

CREATE TABLE t (a INT, b TIMESTAMP DEFAULT NOW);
If we then insert data into this table as follows (with a short delay between executing the two statements):
INSERT INTO t (a) VALUES (3), (5);
INSERT INTO t (a) VALUES (1);

Then the tuples in t will have recorded the time they were inserted into MonetDB:

SELECT * FROM t;
+------+----------------------------+
| a    | b                          |
+======+============================+
| 3    | 2015-11-26 09:17:03.000000 |
| 5    | 2015-11-26 09:17:03.000000 |
| 1    | 2015-11-26 09:17:07.000000 |
+------+----------------------------+

A more interesting use case is counting how many voyages started more than 400 years ago:

SELECT COUNT(*) FROM voyages WHERE departure_timestamp < NOW - INTERVAL '400' YEAR;
+------+
| L1   |
+======+
| 319  |
+------+

The above query uses date computation to subtract 400 years from the current date and time. For more information about date computations, see the section below on date computation.

Generating dates and times from strings

There are many use cases where we want to have a date or a time that is not at all related to the current date or time. In that case, we can generate date and time types for MonetDB by passing it a string, along with a second string that tells MonetDB how the first string should be interpreted. This is done by using so called date/time format specifiers.

SELECT str_to_date('23-09-1987', '%d-%m-%Y') AS "date",
       str_to_time('11:40', '%H:%M') AS "time",
       str_to_timestamp('23-09-1987 11:40', '%d-%m-%Y %H:%M') AS "timestamp";
+------------+----------+----------------------------+
| date       | time     | timestamp                  |
+============+==========+============================+
| 1987-09-23 | 11:40:00 | 1987-09-23 11:40:00.000000 |
+------------+----------+----------------------------+

For a complete list of date/time format specifiers, including their meaning, see the section below on date/time format specifiers.

Extracting information from dates and times

Dates and times are stored in such a way that MonetDB can do efficient computations on them. However, sometimes this native format is not what we desire. For example, we might want to create strings from date or time values that is formatted in a way that we can define. This is where the date/time to string conversion functions come in:

SELECT departure_timestamp,
       date_to_str(departure_date, '%Y-%m-%d') AS "date",
       time_to_str(departure_time, '%H:%M') AS "time",
       timestamp_to_str(departure_timestamp, '%Y-%m-%d %H:%M') AS "timestamp"
FROM voyages LIMIT 3;
+----------------------------+------------+-------+------------------+
| departure_timestamp        | date       | time  | timestamp        |
+============================+============+=======+==================+
| 1595-04-02 10:34:46.000000 | 1595-04-02 | 10:34 | 1595-04-02 10:34 |
| 1595-04-02 00:52:57.000000 | 1595-04-02 | 00:52 | 1595-04-02 00:52 |
| 1595-04-02 08:01:55.000000 | 1595-04-02 | 08:01 | 1595-04-02 08:01 |
+----------------------------+------------+-------+------------------+

You can also extract certain portions from date and time fields, using the following keywords in combination with the EXTRACT(<keyword> FROM <date/time field>) function:

  • YEAR
  • MONTH
  • DAY
  • HOUR
  • MINUTE
  • SECOND

We can use this to compute how many years ago the first and the last recorded departure occurred:

SELECT
  MAX(EXTRACT(YEAR FROM CURRENT_DATE) - EXTRACT(YEAR FROM departure_date)) AS first,
  MIN(EXTRACT(YEAR FROM CURRENT_DATE) - EXTRACT(YEAR FROM departure_date)) AS last
FROM voyages;
+-------+------+
| first | last |
+=======+======+
| 420   | 219  |
+-------+------+

So from the year of this writing (2015), the recorded voyage departures were between 219 and 420 years ago.

Now let us use MonetDB to compute something really interesting. The table "total" contains statistics about the number of deaths on every voyage (in the columns "death_at_cape" and "death_during_voyage"). We can join this table with the voyages table and then use the EXTRACT function to group the total number of deaths by the year in which the voyage started. This result can then be used to create a top 10 of the departure years in which the most deaths occurred.

SELECT EXTRACT(YEAR FROM departure_date) AS "year"
     , SUM(death_at_cape + death_during_voyage) AS deaths
  FROM voyages JOIN total
    ON voyages.number = total.number
   AND voyages.number_sup = total.number_sup
 GROUP BY "year"
 ORDER BY deaths DESC
 LIMIT 10;
+------+--------+
| year | deaths |
+======+========+
| 1771 | 2993   |
| 1772 | 1094   |
| 1767 | 1016   |
| 1773 | 380    |
| 1766 | 307    |
| 1775 | 298    |
| 1774 | 158    |
| 1726 | 154    |
| 1619 | 151    |
| 1671 | 149    |
+------+--------+

Apparently, between ~1760 and 1780, it was not the safest time to travel.

Date computation

MonetDB can do native computation on two similar date or time fields, and on a date or time field and an interval. Subtracting or adding two similar date or time fields is done by doing the computation on its operands converted to days in case of a date field and converted to seconds otherwise. For example:

SELECT CURRENT_TIME - departure_time AS timediff_sec
     , arrival_date - departure_date AS journey_days
FROM voyages LIMIT 3;
+--------------+--------------+
| timediff_sec | journey_days |
+==============+==============+
| 7227.000     | 431          |
| 42136.000    | 431          |
| 16398.000    | 431          |
+--------------+--------------+

We can use this to get the minimum and the maximum days taken by a voyage.

SELECT MIN(arrival_date - departure_date) AS voyage_min_days,
       MAX(arrival_date - departure_date) AS voyage_max_days
FROM voyages;
+-----------------+-----------------+
| voyage_min_days | voyage_max_days |
+=================+=================+
| -218944         | 73237           |
+-----------------+-----------------+

This result leads us to believe there must be inconsistencies in the data, since negative journey times should not occur and a voyage that took more than 20 years might be unlikely as well.

For other computations, an interval is taken as a second argument and the result is of the same type as the first argument.

We can use this to count the number of voyages that have finished within a year:

SELECT COUNT(*) FROM voyages
 WHERE arrival_date < departure_date + INTERVAL '1' YEAR;
+------+
| L1   |
+======+
| 7339 |
+------+

Time zones

Every MonetDB connection can have an associated time zone:

SELECT NOW;
+----------------------------------+
| current_timestamp                |
+==================================+
| 2015-11-26 13:46:17.000000+01:00 |
+----------------------------------+<

This shows the time, assuming we are in time zone GMT +01:00.

We can change our current time zone as follows:

SET TIME ZONE INTERVAL '+03:00' HOUR TO MINUTE;

After changing the time zone, the same query now gives a different result:

SELECT NOW;
+----------------------------------+
| current_timestamp                |
+==================================+
| 2015-11-26 15:46:17.000000+03:00 |
+----------------------------------+

As you can see, it adds two hours to the displayed time compared to the previous time we executed this query.

Interval conversions

As you might have noticed in the previous section, there is an interval conversion function that takes an 'hour' string and converts it to minutes. Strictly speaking, this is an interval with arguments, and the following interval/argument combinations are possible:

  • YEAR TO MONTH
  • DAY TO SECOND
  • HOUR TO MINUTE
  • HOUR TO SECOND

Example usage:

SELECT INTERVAL '2015-09-23' YEAR TO MONTH AS y2m,
       INTERVAL '+01:00' HOUR TO MINUTE AS h2m,
       INTERVAL '+01:30:15' HOUR TO SECOND AS h2s;
+-------+----------+----------+
| y2m   | h2m      | h2s      |
+=======+==========+==========+
| 24189 | 3600.000 | 5415.000 |
+-------+----------+----------+

Date/time format specifiers

All possible format specifiers are listed below (taken from Linux Programmer’s Manual)

 %a   The abbreviated name of the day of the week according to the current locale.
 %A   The full name of the day of the week according to the current locale.
 %b   The abbreviated month name according to the current locale.
 %B   The full month name according to the current locale.
 %c   The preferred date and time representation for the current locale.
 %C   The century number (year/100) as a 2-digit integer. (SU)
 %d   The day of the month as a decimal number (range 01 to 31).
 %D   Equivalent to %m/%d/%y. (Yecch—for Americans only. Americans should note that in other countries %d/%m/%y is rather common. This means that in international context this format is ambiguous and should not be used.) (SU)
 %e   Like %d, the day of the month as a decimal number, but a leading zero is replaced by a space. (SU)
 %E   Modifier: use alternative format, see below. (SU)
 %F   Equivalent to %Y-%m-%d (the ISO 8601 date format). (C99)
 %G   The ISO 8601 week-based year (see NOTES) with century as a decimal number. The 4-digit year corresponding to the ISO week number (see %V). This has the same format and value as %Y, except that if the ISO week number belongs to the previous or next year, that year is used instead. (TZ)
 %g   Like %G, but without century, that is, with a 2-digit year (00-99). (TZ)
 %h   Equivalent to %b. (SU)
 %H   The hour as a decimal number using a 24-hour clock (range 00 to 23).
 %I   The hour as a decimal number using a 12-hour clock (range 01 to 12).
 %j   The day of the year as a decimal number (range 001 to 366).
 %k   The hour (24-hour clock) as a decimal number (range 0 to 23); single digits are preceded by a blank. (See also %H.) (TZ)
 %l   The hour (12-hour clock) as a decimal number (range 1 to 12); single digits are preceded by a blank. (See also %I.) (TZ)
 %m   The month as a decimal number (range 01 to 12).
 %M   The minute as a decimal number (range 00 to 59).
 %n   A newline character. (SU)
 %O   Modifier: use alternative format, see below. (SU)
 %p   Either "AM" or "PM" according to the given time value, or the corresponding strings for the current locale. Noon is treated as "PM" and midnight as "AM".
 %P   Like %p but in lowercase: "am" or "pm" or a corresponding string for the current locale. (GNU)
 %r   The time in a.m. or p.m. notation. In the POSIX locale this is equivalent to %I:%M:%S %p. (SU)
 %R   The time in 24-hour notation (%H:%M). (SU) For a version including the seconds, see %T below.
 %s   The number of seconds since the Epoch, 1970-01-01 00:00:00 +0000 (UTC). (TZ)
 %S   The second as a decimal number (range 00 to 60). (The range is up to 60 to allow for occasional leap seconds.)
 %t   A tab character. (SU)
 %T   The time in 24-hour notation (%H:%M:%S). (SU)
 %u   The day of the week as a decimal, range 1 to 7, Monday being 1. See also %w. (SU)
 %U   The week number of the current year as a decimal number, range 00 to 53, starting with the first Sunday as the first day of week 01. See also %V and %W.
 %V   The ISO 8601 week number (see NOTES) of the current year as a decimal number, range 01 to 53, where week 1 is the first week that has at least 4 days in the new year. See also %U and %W. (SU)
 %w   The day of the week as a decimal, range 0 to 6, Sunday being 0. See also %u.
 %W   The week number of the current year as a decimal number, range 00 to 53, starting with the first Monday as the first day of week 01.
 %x   The preferred date representation for the current locale without the time.
 %X   The preferred time representation for the current locale without the date.
 %y   The year as a decimal number without a century (range 00 to 99).
 %Y   The year as a decimal number including the century.
 %z   The +hhmm or -hhmm numeric timezone (that is, the hour and minute offset from UTC). (SU)
 %Z   The timezone name or abbreviation.
 %+   The date and time in date(1) format. (TZ) (Not supported in glibc2.)
 %%   A literal '%' character.

 

Transaction Replication

Transaction Replication dnedev Wed, 07/22/2015 - 17:39

As of the Jun2016 release, MonetDB has added high-availability support through transaction-replication (NB, this feature is still somewhat experimental, therefore should be used with great care!).

Transaction-replication is done by shipping the transaction logs from the master instance to a number of replicas. It is generally described as a pull model, where each replica pulls transactions independently and asynchronously. Aside from the transation logs, there is no further master-replica information exchange. Formally this is considered Lazy Centralized replication with Limited Transparency.

By default a MonetDB server stores transactions in Write-Ahead Log (WAL) files, before the transactions are persisted in the primary BATs storage. During the database startup, the transaction log files are read and any data changes non-persisted in the BATs are made persistent. A MonetDB replica can be configured to read the WAL files of a master instance, load the transactions and persist the data in its own BATs.

On the master instance, MonetDB must be configured to keep all transaction log files, even those of the transactions already persisted in its primary storage, because otherwise, the database server cleans-up persisted transaction log files by default. The transaction log files on the master have to be shipped to the replica(s), which can be done using a highly-available shared file system or alternative means.

On a replica instance, the location of the master transaction log files must be configured. In addition, the transaction drift threshold between the replica and the master must be set. The drift is the maximal difference allowed between the transactions processed by the master and the replica. If a replica detects that it has past the set threshold, it will apply the transactions in the WAL files to catch up with the master. Finally, the replica must be set to run in read-only mode.

The master is the only instance that can apply changes to the data (create, insert, update, delete) to avoid any data inconsistencies. As such all replicas must run in read-only mode, where data changes will be propagated only through the transaction-replication mechanism. To guarantee that a replica instance only replicates the master's WAL, a MonetDB instance will never function in replica mode, unless it is started in read-only mode.

Setup

First and foremost, the shipping of the transaction log files between the master and the replicas must be configured. The most straightforward way is to use a shared file system, which also provides high-availability for the master transaction log files. The master and replica instances must run the same version of MonetDB.

Master:

  • The flag "gdk_keep_persisted_log_files" denotes the number of old WAL files, i.e. WAL files of transactions that have already been persisted at the master instance. If the flag is not set (i.e. gdk_keep_persisted_log_files=0, which is the default), during a garbage collection action, the master will clean-up all old WAL files. If the flag is set to N, the master will keep the most recent N WAL files during garbage collection. Without the WAL files, a replica will not be able to read and replicate all transaction. To allow old replicas to catch up, one might set this flag to a fairly large number.
    • gdk_keep_persisted_log_files=1000
  • (Optionally) Set a custom location for the transaction log files. This is useful if you are using a shared file system and/or you want the WALs to be written to a different location than where your database resides (e.g. under a shared FS volume).
    • gdk_logdir=<path to transaction log directory>

Replica:

  • Set the location of the master transaction log files, made availably locally through the initially set replication mechanism.
    • gdk_shared_logdir=<path to master transaction log directory>
  • Set the shared transaction drift threshold. That is how far back the replica may be behind the master. It should be ≥ 0.​
    • gdk_shared_drift_threshold=<desired drift threshold>
  • Set the read-only flag. The replica must always work in read-only more, only replicating transactions from the master, never any on their own.
    • --readonly

Here is an example how to set up transaction replication using a shared file system for the master WAL.

-- Start a master server, set gdk_keep_persisted_log_files to 1 and a directory for the master WAL, placed on a shared filesystem
$ mserver5 --dbpath=<path-to-master-db> --set mapi_port=50000 --set gdk_keep_persisted_log_files=1 --set gdk_logdir=<path-to-shared-fs-for-master-logs>
-- Start a replica server, pointing the gdk_shared_logdir to the perviously shared master WAL dir, set gdk_shared_drift_threshold to the desired value, and enable read-only mode
$ mserver5 --dbpath=<path-to-replica-db> --set mapi_port=50001 --set gdk_shared_logdir=<path-to-shared-fs-for-master-logs> --set gdk_shared_drift_threshold=0 --readonly

Clusters configurations

The recommended cluster configuration is a warm standby, where:

  • There is a single master instance, which can do read and write operations
  • There are one or more replica instances, which process no queries, but only replicate the master transactions asynchronously
  • Upon master failure, a replica instance can be restarted in non-read-only mode, to take over the role of a master.

The warm standby configuration has the advantage that it can provide increased fault tolerance and is relatively simple to set up. Since only the instance loading the data can be queried, the queries operate on an always up-to-date store. Since the replication is done asynchronously, if the master is down, the replacing replica may lag behind the former master. The only way to avoid that is to use a shared file system for the log shipping (see below). This will ensure the in case of master failure, the last logs will be available for the replica replicate and be restarted as master, ASAP. 

Log shipping

We also plan to extend the MonetDB-Daemon (monetdbd) to support log shipping and master-replica cluster configurations, to be used in conjunction with the transaction-replication. Until that time, it is recommended that a shared file system is used for shipping the transaction logs. It is advisable to choose a shared file system that is also fault tolerant, such that loss of a master instance will not lead to loss the master transaction logs as well. This way the shared file system will provide both log shipping, as well as log backup. Since the master instance will preserve all transaction log files, there will be a complete copy of the data of that database (in the form of transactions). Good examples of such file systems include:

If MonetDB is deployed in the cloud, the native storage of the cloud provider can be used. For example, in AWS one can setup WAL file replications on the Amazon Elastic File System (EFS). It has high durability and availability, making it ideal for both log shipping and backup.

MonetDB is primarily designed to be used as an analytical database. As such, data are best loaded in large bulk transactions. This will also guarantee that only single large files are shipped to the replicas for replication, minimising the transaction drift.

Known problems and limitations

  • The master transaction log directory must be available upon a replica start up, otherwise the replica instance will not start.
  • If a replica detects a missing master transaction log file, it will stop, since it will be unable to replicate all master transactions.
  • Although a replica does not refuse to execute read-only queries, this is unsafe. It can cause a replica to crash or corrupt its replicated database.
  • Currently, the database users must ensure the replication and availability of the master transaction log files. This can be done via a shared file system or an alternative mechanism. However, we do plan to build that in monetdbd in the future.
  • Beside the "gdk_keep_persisted_log_files" flag, there is currently no mechanism/call to instruct the master to clean-up the WAL files earlier (after all replicas have been synced). This could lead to increased disk space usage. Adding an external clean-up call is pending

Lazy Logical Replication

Lazy Logical Replication mk Mon, 05/01/2017 - 21:55

EXPERIMENTAL CODE available in development branch as of May 2017.

Lazy logical replication has been added to the source code to realize an asynchronous logical replication management scheme using change set forwarding. Simplicity and ease of end-user control have been the driving arguments in its development.

What is Lazy Logical Replication?

In a data analytics environment the workload on a database is largely read only. Applications grind the data for business insights and summarizations through visual dashboards. Updates are often collected as (micro-) batches and injected into the data warehouse at regular intervals.

If the ingestion rate increases, updates become more complex, or the number of concurrent data analysis applications rises, it becomes mandatory to create a master/replica infrastructure. The master instance is responsible for handling all updates to the database, and replica instances are created to satisfy the responsiveness required by the applications.

A key observation for this common business scenario is that the replicas may lag a little behind. Because data analysts often look at long term patterns using statistical summarizations, and therefore the outcome is less dependent on what happened during the last minute. Furthermore, the replicated data warehouse is likely to run in a Cloud setting, or a cluster with a shared global filesystem. This creates room to simplify the synchronisation between instances, relying on the services provided by the filesystem. In particular, master and replicas share the same <dbfarm> directory.

The Lazy Logical Replication technique rests on detecting change sets in the persistent tables at the master instance, which are collected in a transactional safe way, and replayed at the replica. Replay can be interrupted to obtain a time-warped copy of the database.

When to consider Lazy Logical Replication?

The goal of this extension module is to ease backup and replication of a complete master database with a time-bounded delay. This means that both the master and the replicas run at a certain beat (eg in seconds) by which information is made available by the master or read by the replicas. Such an instance can be freely used for query workload sharing, database versioning, and (re-)partitioning. For example, a replica can be used to support a web application which also keeps application specific data in the same instance, eg session information.

Tables taken from a master can be protected against updates and inspections in a replica instance using the schema access policies defined by the master. Furthermore, updates against replicated tables are not automatically forwarded to the master. Any transaction change set replay that fails stops the cloning process. By default, only persistent tables are considered for replication, and all constraints maintained by the master are carried over to the replicas. Updates under the 'tmp' schema, ie temporary tables, are ignored.

For a backup, we need either all update logs for the entire lifetime of a database, or a binary database snapshot with a collection of logs that have recorded all changes since the snapshot was created. For a replication, also called a database clone, we take a snapshot and the log files that reflect the recent changes. Then, the logged updates are replayed against the snapshot until a specific point in time or transaction id is reached, as identified by the clone itself.

The underlying assumption of the techniques deployed is that the database resides on a proper (global/distributed) file system to guarantees recovery from most storage system related failures, eg using RAID disks or Log-Structured-File systems.

How to set up a master instance?

The safest way to create a master/replica instance is to start with an empty master database. Alternatively, one can stop the master instance, take a binary copy of the <dbfarm>/<dbname> directory and save it for initialization of the replica. A database instance can be set into the 'master' mode only once using the SQL command:

        CALL master();

An optional path to the log record directory can be given to reduce the IO latency, eg using a nearby SSD, or where there is ample of space to keep a long history, such as an HDD or a cold storage location. By default, the command creates a directory /<path-to>/<dbfarm>/<dbname>/wlc_logs to hold all logs, and a configuration file /<path-to>/<dbfarm>/<dbname>/wlc.config to hold the state of the transaction logs (WLC stands for Work Load Capture). It contains the following <key>=<value> pairs:

        snapshot=<path to a snapshot directory>
        logs=<path to the wlc log directory>
        state=<1: started, 2: stopped>
        batches=<next available batch file to be applied>
        beat=<maximal delay between log files, in seconds>
        write=<timestamp of the last transaction recorded>

A missing snapshot path denotes that we can start a replica from an empty database. The log files are stored as <dbname>_<batchnumber> in the "wlc_logs" directory. They belong to the snapshot. Each WLC log file contains a serial logs of committed compound transactions. The log records are represented as ordinary MAL statement blocks, which are executed in serial mode. Each transaction is identified by a unique id, its starting time, and the responsible database user. The log records must end with a COMMIT to be allowed for re-execution. Log records with a ROLLBACK tag are merely for off-line analysis by the DBA.

A transaction log file is created by the master using a heartbeat in seconds. A new transaction log file is published, after the system has been collecting transaction records for some time. The beat can be set using the SQL command:

        CALL masterbeat(<duration>);

Setting the master heartbeat to zero leads to one log file per transaction, and this may lead to a log directory with potentially a large amount of files. A default of 5 minutes should balance the polling overhead in most practical situations. The log file is shared within 'beat' seconds after the first transaction record was written into it.

The final step in the life time of a master instance is to stop transaction logging with the SQL command:

        CALL stopmaster();

This marks the end-of-life time for a snapshot. For example, when planning to do a large bulk load of the database, stopping logging avoids a double write into the database. The database can only be brought back into the master mode using a fresh snapshot.

One of the key challenges for a DBA is to keep the log directory manageable, because it grows with the speed in which updates are applied to the database. This calls for regularly checking for their disk footprint, and taking a new snapshot as a frame of reference. A master instance has no knowledge about the number of clones and their whereabouts. To ensure transaction ACID properties, the log records are stored on disk within the transaction brackets, which may cause extra I/O pressure. This can be alleviated by storing the database and logs files on an SSD or a Non-Volatile-Memory (NVM) device.

How to make a replica instance?

Every clone starts off with a copy of the binary snapshot identified by 'snapshot'. A fresh database can be turned into a clone using the call:

        CALL replicate('mastername')

It will grab the latest snapshot of the master and applies all available log files before releasing the database. Progress of the replication can be monitored using the -fraw option in mclient.

The clone process will iterate in the background through the log files, applying all updating transactions. An optional timestamp or transaction id can be passed to the replicate() command to apply the logs until a specific moment or transaction. This is particularly useful when an unexpected desastrous user action, eg dropping a persistent table, has to be recovered from.

        CALL replicate('mastername');
        CALL replicate('mastername',NOW()); -- stops after we are in sync
        ...
        CALL replicate(NOW()); -- partial roll forward
        ...
        CALL replicate();      -- continue nondisturbed synchronisation

        SELECT replicaClock(); -- returns the timestamp of the last replicated transaction.
        SELECT replicaTick();  -- returns the transaction id of the last replicated transaction.
        SELECT masterClock();  -- return the timestamp of the last committed transaction in the master.
        SELECT masterTick();   -- return the transaction id of the last committed transaction in the master.

Any failure encountered during a change set replay terminates the replication process, leaving a message in the merovingian log.

Features to be considered beyond the Alpha release

  • Include master/replica management commands into the monetdb control program.
  • Turn off logging of the replicated tables in the replica instance, since we already have them.
  • Roll forward of SQL session variables, i.e. optimizer_pipe. For updates we don't need special care for this.
  • Agents (message buss) to ship the log and snaphot files between different file systems to improve local access speed.
  • Selective replication by partitioning the log files based on predicates.
  • Deploy parallel updates on the change set files.