[Monetdb-developers] sorting fetchjoin

Peter Boncz P.Boncz at cwi.nl
Tue Apr 7 10:37:02 CEST 2009


Martin,

you suggest:

> If we do a fetchjoin(L,R) to a table R that does not fit in memory,
> and the operand L has more oids then the number of pages in R
> then we should also sort L on the tail oids first

The problem is that many MonetDB use fetchjoin as an order-aware operator and
expect the output to be in left input order. This is then exploited in tuple
reconstruction. It will certainly break pathfinder, for instance. Applications
like Proximity would also break.

In my thesis, I had suggested to create an accelerator that attaches a
radix-clustered version to the left input. Fetchjoin would then use this
radix-clustered version, and call decluster on the result, restoring the
original order (using a cache and dick-friendly pattern).

However, I have come to dislike the idea of low-level accelerators. I now think
this is better handled on a higher layer, specifically the relational algebra
layer.

Peter



> -----Original Message-----
> From: Martin Kersten [mailto:Martin.Kersten at cwi.nl]
> Sent: Friday, April 03, 2009 12:42 PM
> To: monetdb-developers at lists.sourceforge.net; Peter Boncz
> Subject: Re: [Monetdb-checkins] MonetDB/src/gdk gdk.mx, , 1.278, 1.279
> gdk_bat.mx, , 1.213, 1.214 gdk_posix.mx, , 1.168, 1.169 gdk_relop.mx, ,
> 1.161, 1.162 gdk_utils.mx, , 1.241, 1.242
> 
> Peter Boncz wrote:
> > Update of /cvsroot/monetdb/MonetDB/src/gdk
> > In directory 23jxhf1.ch3.sourceforge.com:/tmp/cvs-serv31767/gdk
> >
> > Modified Files:
> > 	gdk.mx gdk_bat.mx gdk_posix.mx gdk_relop.mx gdk_utils.mx
> > Log Message:
> > various tweaks to the memory mangement
> >
> > - tune the GDK_mmap_minsize to vary between full RAM size and 128MB
> >   (lowest when memory pressure is extreme)
> >
> > - in fetchjoin, when the optimization of string heap copying is applied
> >   (instead of creating a new string heap and inserting into it), try
> >   to share the heaps in virtual memory (rather than copying it)
> >   This is possible with the logical view mechanism.
> >
> >   now fetchjoin, batcopy and remap use this VM heap sharing.
> If we do a fetchjoin(L,R) to a table R that does not fit in memory,
> and the operand L has more oids then the number of pages in R
> then we should also sort L on the tail oids first
> 
> This under the constraint that sorting L also cost some IO
> 
> If you are tweaking in that area anyway, such a patch is
> appreciated.
> >
> > - disable vmalloc on Linux -- well actually on platforms
> >   that do not have posix_fadvise. A bit of a hack.
> >
> >
> 
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.0.238 / Virus Database: 270.11.39/2038 - Release Date: 04/02/09
> 19:07:00





More information about the developers-list mailing list