M.Raasveldt at cwi.nl
Sat Sep 24 16:07:02 CEST 2016
To expand on the previous experiment, I ran it again on three different systems.
SELECT MIN(i * 2) FROM integers;
Where ‘integers' contains 100M randomly generated integers between 0-100. The MAL operations performed are upcasting from int to lng (to prevent multiplication from overflowing) and the actual multiplication, requiring two allocations of large transient bats. The server had either “—set gdk_mmap_minsize=1000000000000” (to force the server to use malloc for the intermediates) or no parameters (resulting in the intermediates being stored in a memory mapped file) on the Jun2016 SP2 release candidate.
The results seem to be mainly operating system related. Performing the test on three operating systems (Windows 10, Fedora 24 and OSX 10.11) results in the following timings.
mmap: 10.3 seconds
malloc: 0.9 seconds
mmap: 2.0 seconds
malloc: 1.7 seconds
mmap: 4.5 seconds
malloc: 1.1 seconds
On Fedora, the difference between mmap and malloc does not seem to be very significant. Malloc is slightly faster, but not by much. On both Windows and OSX, mmap is very slow. Especially on Windows the performance difference is extremely noticeable.
Considering malloc behaves ‘as expected’ (returns NULL if there is not enough physical memory) on Windows and OSX, I suggest setting gdk_mmap_minsize to its maximum value on those systems and letting malloc failures dictate when to switch to mmap for large files.
> On 22 Sep 2016, at 10:44, Hannes Mühleisen <hannes.muehleisen at cwi.nl> wrote:
> Hi Foteini,
>> On 20 Sep 2016, at 23:45, Foteini Alvanaki <F.Alvanaki at cwi.nl <mailto:F.Alvanaki at cwi.nl>> wrote:
>> It is interesting that malloc is faster than mmap.
> Yeah this is one of the magic MonetDB parameters…
>> Could you give more details about the setting of your experiment?
> just running select (i*2) from table;
> i is a column containing 100M integers. Ran on my laptop (OSX) with 4 threads.
>> How many threads, how many concurrent files (BATs), size of files,
>> type of data, kind of accesses etc.
>> ----- Original Message -----
>> From: Roberto Cornacchia <roberto.cornacchia at gmail.com>
>> To: Communication channel for developers of the MonetDB suite. <developers-list at monetdb.org>
>> Sent: Tue, 20 Sep 2016 18:40:41 +0200 (CEST)
>> Subject: Re: GDK_mmap_minsize again
>> If I may add, that is indeed the default behaviour of the kernel, which can
>> be disabled with
>> vm.overcommit_memory = 2
>> in /etc/sysctl.conf
>> Perhaps MonetDB could check this system setting and decide on which
>> strategy to use?
>> On 20 September 2016 at 18:24, Sjoerd Mullender <sjoerd at acm.org> wrote:
>>> As far as I understand it, malloc on Linux will happily succeed even if
>>> there is not enough memory+swap to hold all data. So you can't rely on
>>> malloc failures to tell you to switch to mmap.
>>> On 09/20/2016 06:19 PM, Hannes Mühleisen wrote:
>>>> Hello list,
>>>> we were wondering about the purpose of GDK_mmap_minsize when creating
>>> transient columns. The attached patch will always *try* to malloc/realloc a
>>> transient column but still fall back to memory-mapped files if malloc
>>> should fail. This dramatically improves performance. Any good reason why
>>> this should not be the default behaviour?
>>>> Mark and Hannes
>>> Sjoerd Mullender
>>> Sjoerd Mullender
>>> developers-list mailing list
>>> developers-list at monetdb.org
>> developers-list mailing list
>> developers-list at monetdb.org
> developers-list mailing list
> developers-list at monetdb.org <mailto:developers-list at monetdb.org>
> https://www.monetdb.org/mailman/listinfo/developers-list <https://www.monetdb.org/mailman/listinfo/developers-list>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the developers-list