[Monetdb-developers] join performance issue!
Stefan.Manegold at cwi.nl
Tue Jul 25 19:51:30 CEST 2006
to prevent you from falling into the same trap as I did this afternoon when
trying to analyze, measure & compare positional (fetch-)join performance,
here's a warning and my detailed story:
When running 'join(BAT,BAT);' in MIL or 'algebra.join(BAT,BAT);' in MAL,
be aware that this includes not only the actual join algorithm, but also a
BATpropcheck() call to check and set properties of the join result
(ALWAYS, i.e., also with the default --debug=0 !)
The latter can easily be up to 85% (*eighty-five* percent!) of the time you
measure with a simple wall-clock around the 'join(BAT,BAT);' or
'algebra.join(BAT,BAT);' (see details below) !
Preparing to invent and implement another cache-aware/-friendly positional
(fetch-)join algorithm for some special cases, I was trying to assess the
performance of the current positional (fetch-)join algorithms for the
following three cases:
In all three cases, r is a [V(o)ID, INT] BAT, i.e., has a dense
non-materialized head and a randomly distributed integer tail.
ld, lo, lu are all [INT, OID] BATs with a randomly distributed integer tail,
and an OID head that is
in ld: dense (and marked so in the BAT header) (but materialized)
in lo: ordered (actually also dense, but only marked ordered, not
marked dense in the BAT header)
in lu: unordered (random shuffle of an originally dense sequense)
Thus, the three case trigger the use following join implementations,
I used the same size (count) four BATs, and the join is a perfect 1:1 hit;
hence the result is as big as the inputs.
Here are the wall-clock times (in micro seconds) for BATs of 10000000 tuples
MonetDB 4.13.1 (CVS HEAD)
default compilation (gcc -g -O2)
64-bits and 64-bit OIDs
2 GHz AMD Athlon(tm) 64 X2 Dual Core Processor 3800+
2 GB RAM
1) join(ld,r); 2065162 us
2) join(lo,r); 2090277 us
3) join(lu,r); 3058936 us
A close look under the hood (i.e., in BATjoin in src/gdk/gdk_relop.mx) reveals the
following split timings for the actual join (batjoin()) and the property
batjoin() BATpropcheck() join()
1) join(ld,r); 275331 us 1774275 us 2065162 us (abs)
13.33 % 85.91 % 100.00 % (rel)
2) join(lo,r); 303311 us 1770722 us 2090277 us (abs)
14.51 % 84.71 % 100.00 % (rel)
3) join(lu,r); 1268057 us 1775788 us 3058936 us (abs)
41.45 % 58.05 % 100.00 % (rel)
Well, there are some lessons to learn:
- Be aware of the BATpropcheck when you measure join performance!
- The "mandatory" BATpropcheck is only used for BATjoin and BATleftjoin
in src/gdk/gdk_relop.mx --- at least as far as I know.
- The "mandatory" BATpropcheck in BATjoin and BATleftjoin is there for a
good reason: With joins, it's hard to guess the properties of the result
(are head and/or tail sorted, dense, and or key/unique?) in general, but
the performance of MonetDB highly depends on the availability of these
properties; hence, while making the join (considerably) more expensive,
the investment (might) pay-off with later operation on the join result.
- We should consider making this BATpropcheck in BATjoin and BATleftjoin
configurable from the outside, e.g., by adding an extra optional argument
to join() to request to skipping the BATpropcheck.
I hope you find this information useful...
| Dr. Stefan Manegold | mailto:Stefan.Manegold at cwi.nl |
| CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ |
| 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 |
| The Netherlands | Fax : +31 (20) 592-4312 |
More information about the developers-list