To expand on the previous experiment, I ran it again on three different systems.

The query:

SELECT MIN(i * 2) FROM integers;

Where ‘integers' contains 100M randomly generated integers between 0-100. The MAL operations performed are upcasting from int to lng (to prevent multiplication from overflowing) and the actual multiplication, requiring two allocations of large transient bats. The server had either “—set gdk_mmap_minsize=1000000000000” (to force the server to use malloc for the intermediates) or no parameters (resulting in the intermediates being stored in a memory mapped file) on the Jun2016 SP2 release candidate.

The results seem to be mainly operating system related. Performing the test on three operating systems (Windows 10, Fedora 24 and OSX 10.11) results in the following timings.

Windows 10
mmap: 10.3 seconds
malloc: 0.9 seconds

Fedora 24
mmap: 2.0 seconds
malloc: 1.7 seconds

OSX 10.11
mmap: 4.5 seconds
malloc: 1.1 seconds

On Fedora, the difference between mmap and malloc does not seem to be very significant. Malloc is slightly faster, but not by much. On both Windows and OSX, mmap is very slow. Especially on Windows the performance difference is extremely noticeable. 

Considering malloc behaves ‘as expected’ (returns NULL if there is not enough physical memory) on Windows and OSX, I suggest setting gdk_mmap_minsize to its maximum value on those systems and letting malloc failures dictate when to switch to mmap for large files.

Regards,

Mark

On 22 Sep 2016, at 10:44, Hannes Mühleisen <hannes.muehleisen@cwi.nl> wrote:

Hi Foteini,

On 20 Sep 2016, at 23:45, Foteini Alvanaki <F.Alvanaki@cwi.nl> wrote:

It is interesting that malloc is faster than mmap.
Yeah this is one of the magic MonetDB parameters…

Could you give more details about the setting of your experiment?

just running select (i*2) from table; 
i is a column containing 100M integers. Ran on my laptop (OSX) with 4 threads.

How many threads, how many concurrent files (BATs), size of files,
type of data, kind of accesses etc.


----- Original Message -----
From: Roberto Cornacchia <roberto.cornacchia@gmail.com>
To: Communication channel for developers of the MonetDB suite. <developers-list@monetdb.org>
Sent: Tue, 20 Sep 2016 18:40:41 +0200 (CEST)
Subject: Re: GDK_mmap_minsize again

If I may add, that is indeed the default behaviour of the kernel, which can
be disabled with

vm.overcommit_memory = 2
in /etc/sysctl.conf

Perhaps MonetDB could check this system setting and decide on which
strategy to use?

On 20 September 2016 at 18:24, Sjoerd Mullender <sjoerd@acm.org> wrote:

As far as I understand it, malloc on Linux will happily succeed even if
there is not enough memory+swap to hold all data.  So you can't rely on
malloc failures to tell you to switch to mmap.

On 09/20/2016 06:19 PM, Hannes Mühleisen wrote:
Hello list,

we were wondering about the purpose of GDK_mmap_minsize when creating
transient columns. The attached patch will always *try* to malloc/realloc a
transient column but still fall back to memory-mapped files if malloc
should fail. This dramatically improves performance. Any good reason why
this should not be the default behaviour?

Thanks,

Mark and Hannes




--
Sjoerd Mullender



--
Sjoerd Mullender


_______________________________________________
developers-list mailing list
developers-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/developers-list



_______________________________________________
developers-list mailing list
developers-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/developers-list

_______________________________________________
developers-list mailing list
developers-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/developers-list