[Monetdb-developers] Partitioning

Martin Kersten Martin.Kersten at cwi.nl
Fri Jul 28 07:03:39 CEST 2006


Hello,

Thank you for your interest in MonetDB source and its future.
The answer to the question is simple. Indeed, partitioned tables
are a good mechanism to control scalability problems on small
scale systems. And experimental code has been written to assess
the consequences on the core.

One of the blocking reasons to include it is the lack of an
extendable optimizer infrastructure in the current release.
However, this has changed with the MonetDB 5 alpha release,
which is only available as source tarball.
The draft documentation may be of interest to you and shows
the required functionality,
http://homepages.cwi.nl/~mk/MonetDB/The-MAL-Optimizer.html#The-MAL-Optimizer

Furthermore, we are finishing a project extending
MonetDB Version 5 with the ability to deal with distributed
materialized views in SQL. A big step towards partitioned AND
distributed query processing. This code will certainly become
available for a larger audiance and it plays a crucial role
in managing a multi TB database under construction at our place.

regards, Martin

moredata at fastmail.net wrote:
> Hello,
> 
> I've been following the developments of MonetDB off and on. It seems to
> me that there is a real emphasis on XML-related functionality, and less
> so for BI-related functionality.
> 
> I wanted to ask if there are any plans to allow tables to be partitioned
> by some condition. This would have a couple of benefits. A table could
> be broken out into several BAT files, one for each condition met. That
> would allow you to potentially overcome the 2 GB BAT file limitation on
> 32 bit systems. And of course, when adding intelligence into the
> optimizer for this, it could examine the conditions of queries and
> potentially eliminate entire BAT files to scan.
> 
> For example, assume you table tab with a column named sometype, and used
> sometype for partitioning. Sometype has 10 possible values. Physically,
> MonetDB would create 10 different BAT files for each column,
> corresponding to each possible value. When a query is executed with a
> condition like "sometype = 1", the optimizer is smart enough to know it
> need only use 1 set of the bat files, and not all 10, significantly
> reducing the amount of data that needs to be examined.
> 
> If you are familiar with PostgreSQL, they introduced something like this
> called Constraint Exclusion Partitioning. Each partition is treated as a
> subtable of a main parent table. You are required to insert directly
> into the proper subtable, but querying on the main parent table will
> determine which subtables are required to be examined in processing the
> query. Their scheme is not entirely convenient with loading up data, but
> it is quite flexible in setting up arbitrary conditions on the
> subtables.
> 
> Are there any such plans to do something similar? At what state is this
> in? I believe I saw something about Partitioning on the roadmap several
> months ago. The MonetDB home page mentions OLAP, and it would seem to me
> that a feature like this is critical if MonetDB really wants to handle
> Business Intelligence type of queries.
> 
> Thanks!
> 




More information about the developers-list mailing list