Hello,
I've been following the developments of MonetDB off and on. It seems to
me that there is a real emphasis on developing XML-related
functionality, and less so for BI-related functionality.
I wanted to ask if there are any plans to allow tables to be partitioned
by some condition. This would have a couple of benefits. A table could
be broken out into several BAT files, one for each condition met. That
would allow you to potentially overcome the 2 GB BAT file limitation on
32 bit systems. And of course, when adding intelligence into the
optimizer for this, it could examine the conditions of queries and
potentially eliminate entire BAT files to scan.
For example, assume you have a table with a column named sometype, and
chose sometype for partitioning. Sometype has 10 possible values.
Physically, MonetDB would create 10 different BAT files for each column,
corresponding to each possible value. When a query is executed with a
condition like "sometype = 1", the optimizer is smart enough to know it
need only use 1 set of the bat files, and not all 10, significantly
reducing the amount of data that needs to be examined.
If you are familiar with PostgreSQL, they introduced something like this
called Constraint Exclusion Partitioning (also, MySQL has some
partitioning functionality in its new beta). Each partition is treated
as a subtable of a main parent table. You are required to insert
directly into the proper subtable, but querying on the main parent table
will determine which subtables are required to be examined in processing
the query. Their scheme is not entirely convenient with loading up data,
but it is quite flexible in setting up arbitrary conditions on the
subtables.
Are there any such plans to do something similar? At what state is this
in? I believe I saw something about Partitioning on the roadmap several
months ago. The MonetDB home page mentions OLAP, and it would seem to me
that a feature like this is critical if MonetDB really wants to handle
large data volume Business Intelligence queries.
Thanks.
--
moredata(a)fastmail.net
--
http://www.fastmail.fm - Choose from over 50 domains or use your own
Hi again.
The checkout instructions on the website
cvs -d:pserver:anonymous@monetdb.cvs.sourceforge.net:/cvsroot/monetdb checkout
buildtools MonetDB sql
checks out Monet4.1x rather than Monet5 / Monet4.99.
How do I checkout the later?
I made some changes to 4.1x that I'd like to try on 4.99 before submitting
them (they are so short that I feel I made a MAJOR mistake somewhere, so I'll
be testing them at least one more day).
Btw, should I submit the .diffs to this list?
Zarrabeitia.
Hello,
I've been following the developments of MonetDB off and on. It seems to
me that there is a real emphasis on XML-related functionality, and less
so for BI-related functionality.
I wanted to ask if there are any plans to allow tables to be partitioned
by some condition. This would have a couple of benefits. A table could
be broken out into several BAT files, one for each condition met. That
would allow you to potentially overcome the 2 GB BAT file limitation on
32 bit systems. And of course, when adding intelligence into the
optimizer for this, it could examine the conditions of queries and
potentially eliminate entire BAT files to scan.
For example, assume you table tab with a column named sometype, and used
sometype for partitioning. Sometype has 10 possible values. Physically,
MonetDB would create 10 different BAT files for each column,
corresponding to each possible value. When a query is executed with a
condition like "sometype = 1", the optimizer is smart enough to know it
need only use 1 set of the bat files, and not all 10, significantly
reducing the amount of data that needs to be examined.
If you are familiar with PostgreSQL, they introduced something like this
called Constraint Exclusion Partitioning. Each partition is treated as a
subtable of a main parent table. You are required to insert directly
into the proper subtable, but querying on the main parent table will
determine which subtables are required to be examined in processing the
query. Their scheme is not entirely convenient with loading up data, but
it is quite flexible in setting up arbitrary conditions on the
subtables.
Are there any such plans to do something similar? At what state is this
in? I believe I saw something about Partitioning on the roadmap several
months ago. The MonetDB home page mentions OLAP, and it would seem to me
that a feature like this is critical if MonetDB really wants to handle
Business Intelligence type of queries.
Thanks!
--
moredata(a)fastmail.net
--
http://www.fastmail.fm - Faster than the air-speed velocity of an
unladen european swallow
On Tue, Jul 25, 2006 at 10:02:16PM +0200, Arjen P. de Vries wrote:
> Wasn't the property checking always determined by debug flag 8?
in general: yes.
for join: no, at least not any more since the following checkin:
===================================================================
2003/08/06 - sjoerd: src/gdk/gdk_relop.mx,1.38
DDI Merge.
This is the big one!
Before tag: BeforeDDImerge, after tag: AfterDDImerge.
The DDI merge is now made the official main branch; the CWI_DDI_merge branch
is not to be used anymore.
===================================================================
Well, Sjoerd just did the checking; the origin is most probabaly Peter.
As I said, there are indeed good reasons to do the BATpropcheck (even with
debug=0) for *arbitrary* joins, where we can hardly predict any result
properties --- maybe, we should consider to not do it in those cases
(especially fetchjoin), where we can predict the join result properties
pretty well...
Stefan
> On 25/07/06, Martin Kersten <Martin.Kersten(a)cwi.nl> wrote:
> >
> >
> >
> >Stefan Manegold wrote:
> >> Dear all,
> >>
> >>
> >> - We should consider making this BATpropcheck in BATjoin and BATleftjoin
> >> configurable from the outside, e.g., by adding an extra optional
> >argument
> >> to join() to request to skipping the BATpropcheck.
> >>
> >Instead of pollution of the calling interface, I think we should inject
> >an explicit request to check/enforce properties. This one might also be
> >focussed on those properties of interest according to the optimizer at
> >that point.
> >
> >In M5:
> > b:= algebra.join(BAT,BAT);
> > bat.setProperties(b);
> >
> > ...
> > bat.checkDenseProperty(b);
> > bat.checkOrderProperty(b);
> > bat.checkKeyProperty(b);
> >
> >But first, assess the use of the join(bat,bat) in our controlled
> >code base. Please report on this.
> >
> >regards, Martin
> >
> >
>
>
> --
> ====================================================================
> CWI, room C1.16 Centre for Mathematics and Computer Science
> Kruislaan 413 Email: Arjen.de.Vries(a)cwi.nl
> 1098 SJ Amsterdam tel: +31-(0)20-5924306
> The Netherlands fax: +31-(0)20-5924312
> ===================== http://www.cwi.nl/~arjen/ ====================
--
| Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl |
| CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ |
| 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 |
| The Netherlands | Fax : +31 (20) 592-4312 |
Dear all,
to prevent you from falling into the same trap as I did this afternoon when
trying to analyze, measure & compare positional (fetch-)join performance,
here's a warning and my detailed story:
WARNING:
When running 'join(BAT,BAT);' in MIL or 'algebra.join(BAT,BAT);' in MAL,
be aware that this includes not only the actual join algorithm, but also a
BATpropcheck() call to check and set properties of the join result
(ALWAYS, i.e., also with the default --debug=0 !)
NOTE:
The latter can easily be up to 85% (*eighty-five* percent!) of the time you
measure with a simple wall-clock around the 'join(BAT,BAT);' or
'algebra.join(BAT,BAT);' (see details below) !
Detailed Story:
Preparing to invent and implement another cache-aware/-friendly positional
(fetch-)join algorithm for some special cases, I was trying to assess the
performance of the current positional (fetch-)join algorithms for the
following three cases:
1) join(ld,r);
2) join(lo,r);
3) join(lu,r);
In all three cases, r is a [V(o)ID, INT] BAT, i.e., has a dense
non-materialized head and a randomly distributed integer tail.
ld, lo, lu are all [INT, OID] BATs with a randomly distributed integer tail,
and an OID head that is
in ld: dense (and marked so in the BAT header) (but materialized)
in lo: ordered (actually also dense, but only marked ordered, not
marked dense in the BAT header)
in lu: unordered (random shuffle of an originally dense sequense)
Thus, the three case trigger the use following join implementations,
respectively:
1) densefetchjoin
2) orderedfetchjoin
3) defaultfetchjoin
I used the same size (count) four BATs, and the join is a perfect 1:1 hit;
hence the result is as big as the inputs.
Here are the wall-clock times (in micro seconds) for BATs of 10000000 tuples
MonetDB 4.13.1 (CVS HEAD)
default compilation (gcc -g -O2)
64-bits and 64-bit OIDs
2 GHz AMD Athlon(tm) 64 X2 Dual Core Processor 3800+
2 GB RAM
1) join(ld,r); 2065162 us
2) join(lo,r); 2090277 us
3) join(lu,r); 3058936 us
A close look under the hood (i.e., in BATjoin in src/gdk/gdk_relop.mx) reveals the
following split timings for the actual join (batjoin()) and the property
checking (BATpropcheck()):
batjoin() BATpropcheck() join()
1) join(ld,r); 275331 us 1774275 us 2065162 us (abs)
13.33 % 85.91 % 100.00 % (rel)
2) join(lo,r); 303311 us 1770722 us 2090277 us (abs)
14.51 % 84.71 % 100.00 % (rel)
3) join(lu,r); 1268057 us 1775788 us 3058936 us (abs)
41.45 % 58.05 % 100.00 % (rel)
Well, there are some lessons to learn:
- Be aware of the BATpropcheck when you measure join performance!
- The "mandatory" BATpropcheck is only used for BATjoin and BATleftjoin
in src/gdk/gdk_relop.mx --- at least as far as I know.
- The "mandatory" BATpropcheck in BATjoin and BATleftjoin is there for a
good reason: With joins, it's hard to guess the properties of the result
(are head and/or tail sorted, dense, and or key/unique?) in general, but
the performance of MonetDB highly depends on the availability of these
properties; hence, while making the join (considerably) more expensive,
the investment (might) pay-off with later operation on the join result.
- We should consider making this BATpropcheck in BATjoin and BATleftjoin
configurable from the outside, e.g., by adding an extra optional argument
to join() to request to skipping the BATpropcheck.
I hope you find this information useful...
Stefan
--
| Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl |
| CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ |
| 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 |
| The Netherlands | Fax : +31 (20) 592-4312 |
L.S,
Sorry dat ik dit niet vanaf de SF bug entry kan doen, ik kan me momenteel
niet aanmelden.
Ik ben om monetDb te deployen op een windows machine(dual core 2 GIg), eesrt
gebruikmakend van de release(msi) installer.
Daarna kreeg ik na het importeren van 200 k records, de fout
# Monet Database Server V4.12.0
# Copyright (c) 1993-2006, CWI. All rights reserved.
# Compiled for i686-pc-win32/32bit with 32bit OIDs; dynamically linked.
# Visit http://monetdb.cwi.nl/ for further information.
!WARNING: BBPincref: range error 283
!WARNING: BBPincref: range error 284
!WARNING: BBPincref: range error 282
!WARNING: BBPincref: range error 285
Ik dacht misschien zijn er XP specifieke dll's die mee gecompileert dienen
te worden.
MicrovisualC++ EXPRESS gedownload(gratis tegenwoordig met windows SDK SP2
:-), met nmake in de NT directory gedraaid(Ja alle vooraf gestelde eisen aan
voldaan python etc)
krijg ik de volgende melding.
D:\Development\Projects_C++\Monet_CVS\MonetDB\NT>nmake NEED_MX=1
Microsoft (R) Program Maintenance Utility Version 8.00.50727.42
Copyright (C) Microsoft Corporation. All rights reserved.
python ".\configure.py" "."
"D:\Development\Projects_C++\Monet_CVS\Monet
DB\NT" monetdb_config.h.in > monetdb_config.h
echo #ifndef UNISTD_H > unistd.h
echo #define UNISTD_H >> unistd.h
echo #include "io.h" >> unistd.h
echo #define open _open >> unistd.h
echo #define read _read >> unistd.h
echo #define write _write >> unistd.h
echo #define close _close >> unistd.h
echo #define getpid _getpid >> unistd.h
echo #define umask _umask >> unistd.h
echo #endif >> unistd.h
if exist ".\..\RunMserver.bat.in" python ".\configure.py" "."
"D:\Develo
pment\Projects_C++\Monet_CVS\MonetDB\NT" ".\..\RunMserver.bat.in" >
RunMserver.b
at
if exist ".\..\RunMapiClient.bat.in" python ".\configure.py" "."
"D:\Dev
elopment\Projects_C++\Monet_CVS\MonetDB\NT" ".\..\RunMapiClient.bat.in" >
RunMap
iClient.bat
if exist ".\..\RunMtest.bat.in" python ".\configure.py" "."
"D:\Developm
ent\Projects_C++\Monet_CVS\MonetDB\NT" ".\..\RunMtest.bat.in" > RunMtest.bat
if exist ".\..\RunMapprove.bat.in" python ".\configure.py" "."
"D:\Devel
opment\Projects_C++\Monet_CVS\MonetDB\NT" ".\..\RunMapprove.bat.in" >
RunMapprov
e.bat
"D:\EPF\Microsoft Visual studio 8\VC\BIN\nmake.exe" /nologo /f
".\..\Mak
efile.msc" "prefix=D:\Development\Projects_C++\Monet_CVS\MonetDB\NT" all
if not exist "src" mkdir "src"
copy ".\..\src\Makefile.msc" "src\Makefile"
1 file(s) copied.
cd "src" && "D:\EPF\Microsoft Visual studio 8\VC\BIN\nmake.exe"
/nologo
"prefix=D:\Development\Projects_C++\Monet_CVS\MonetDB\NT" all
if not exist "common" mkdir "common"
NMAKE : fatal error U1073: don't know how to make
'".\..\..\src\common\Makefile.
msc"'
Stop.
NMAKE : fatal error U1077: 'cd' : return code '0x2'
Stop.
NMAKE : fatal error U1077: '"D:\EPF\Microsoft Visual studio
8\VC\BIN\nmake.exe"'
: return code '0x2'
Stop.
D:\Development\Projects_C++\Monet_CVS\MonetDB\NT>
Is daar wat aan te doen?, ik ben niet zo'n C++ wizz.
Met vriendelijke groet,
Dennis Rutjes
Hello.
I need to pipe data into a monetdb database (something along the lines
of 'myprocess | MapiClient -s "COPY ...."'. It doesn't seem to be possible
out-of-the-box, so I wanted to poke around in the code to find out why and
how to fix it (if needed), and hopefully submit a patch.
I've been having a problem with that (playing with the code). I don't know how
to 'compile' the modified .mx files in the sql module. I have both bison and
flex installed, but the makefile uses the commands 'x' and 'l' to process
the .mx, and I don't have any idea of what x and l are.
So, in a nutshell, my question is: "How do I recompile the sql module after I
modify the .mx files? What are 'x' and 'l'?"
Regards,
Zarrabeitia.
Hello,
I thought lets send a new posting to specify my goal. I want to use monetDB
asa native XML database embedded in a java application. Thats why I would like to
know where I can find the textfile as mentioned in the article found in the next link:
http://www.monetdb.nl/TechDocs/APIs/JDBC/node9.html
In my first posting I mentioned I was tying to find the textfile in some files I downloaded.
That were XQuery binaries and source files. Perhaps I was looking at the wrong
location and could somebody tell where so find the needed textfile.
With kind regards,
Berny