Hi Stefan,

I am using July 2017-SP2 (11.27.9) for more than 2 years on production server, without any significant memory problem beside some maintenance restart to free the memory and get back to the expected performance.

After the release of the April 2019 version (11.33.3) and the long awaited features addition, I decided to start the migration plan, so I took a snapshot of the disk and installed Mar 2018 (11.29.3), start the database for the migration and re-installed April 2019 and started the database again all was working fine.

After consulting the log and making sure everything was ok, I started playing my workload to test the performance and maybe some memory leak fixes. But after the second query I got the bellow error:

2019-07-03 08:57:44 ERR db1[1538]: #DFLOWworker6:!ERROR: MT_mremap: open(/home/centos/dbfarm/db1/bat/21/77/217767.tail) failed
2019-07-03 08:57:44 ERR db1[1538]: #DFLOWworker6:!OS: No such file or directory
2019-07-03 08:57:44 ERR db1[1538]: = gdk_posix.c:447: MT_mremap(/home/centos/dbfarm/db1/bat/21/77/217767.tail,0x7fdabdabb000,136380416,179961856): open() failed
2019-07-03 08:57:44 ERR db1[1538]: #GDKmremap(179961856) fails, try to free up space [memory in use=108540596072,virtual memory in use=134784814952]
2019-07-03 08:57:44 ERR db1[1538]: #DFLOWworker6:!ERROR: HEAPextend: failed to extend to 179961856 for 21/77/217767.tail: GDKmremap() failed
2019-07-03 08:57:44 ERR db1[1538]: #DFLOWworker6:!ERROR:MALException:bat.append:GDK reported error: MT_mremap: open(/home/centos/dbfarm/db1/bat/21/77/217767.tail) failed
2019-07-03 08:57:44 ERR db1[1538]: #DFLOWworker6:!ERROR:!OS: No such file or directory
2019-07-03 08:57:44 ERR db1[1538]: #DFLOWworker6:!ERROR:!ERROR: HEAPextend: failed to extend to 179961856 for 21/77/217767.tail: GDKmremap() failed

The Server is running centos 7.6 64-bit with kernel 5.1, 128Gb ram, 16 VCPU, 2TB ssd, tried both with swap on and off, and at the moment the query was executed there was 1.4TB empty disk space and the free command was showing 40GB Ram as available.

After checking that the default value of GDK_VM_MAXSIZE set to 4TB, It must be another setting that preventing monetdb to allocate the needed mmap file.
Does the mentioned settings "--set gdk_mmap_minsize_persistent=$[1<<33] --set gdk_mmap_minsize_transient=$[1<<33]" help in this case? can you please explain a bit more about their role.
Does starting a fresh database on the new version and use msqldump to migrate the data the preferred way to avoid any conflict in settings from earlier monetbd release?
Does changing any kernel settings like vm.max_map_count have any effect to monetdb?

Thanks.


On Wed, Apr 4, 2018 at 9:18 AM Stefan Manegold <Stefan.Manegold@cwi.nl> wrote:
Dear Rancho,

we have not yet been able to reproduce your problem.

Could you possibly please try to upgrade to the latest Mar2018 release
and check whether that works, or not, for you?

In case you cannot upgrade yet/now, or in case also Mar2018 fails
just like Jul2017-SP4 does,
could you possibly please try to start mserver5 with additional options
--set gdk_mmap_minsize_persistent=$[1<<33] --set gdk_mmap_minsize_transient=$[1<<33]

These enlarge the default limits as of when MonetDB starts using mmap rather than malloc
for memory allocation for persistent and transient columns, respectively, to
8 GB (default is 256kB for persistent and 4GB for transient).

To do so, you need to start mserver5 "by hand" rather than via monetdb(d).
If you normally use the latter, please check the merovingian.log for the
exact mserver5 command line to use with your dbfarm, and add above options.

In case 8GB ($[1<<33]) turns out to fails as well, try, say, 4GB ($[1<<32]), instead.

We'd be curious to hear whether any of these work for you, or not.

Thanks!

Best,
Stefan

----- On Mar 26, 2018, at 8:45 AM, RanchoYuan yuanshijia@ww-it.cn wrote:

> Hi Stefan,
>
> Thanks for your reply!
>
>
> We have run the query a few times with different size of data. There we used 16G
> RAM(actually 13.5G was used), and find the size of 10G's data is the critical
> point that can run the query. All of the data files' size are listed below,
> each file name is a table name(there are only a few tables are refered --
> store_sales, date_dim, item, customer, catalog_sales, web_sales):
>
>
> 7.4K call_center.dat
> 1.6M catalog_page.dat
> 212M catalog_returns.dat
> 2.9G catalog_sales.dat
> 27M customer_address.dat
> 64M customer.dat
> 77M customer_demographics.dat
> 9.9M date_dim.dat
> 77B dbgen_version.dat
> 149K household_demographics.dat
> 328B income_band.dat
> 2.6G inventory.dat
> 28M item.dat
> 61K promotion.dat
> 1.7K reason.dat
> 1.1K ship_mode.dat
> 27K store.dat
> 323M store_returns.dat
> 3.8G store_sales.dat
> 4.9M time_dim.dat
> 1.2K warehouse.dat
> 19K web_page.dat
> 98M web_returns.dat
> 1.5G web_sales.dat
> 12K web_site.dat
>
> So we guess that the monetdb has no memory management?
>
>
> For the output of `mserver5 --version` is:
>
> MonetDB 5 server v11.27.13 "Jul2017-SP4" (64-bit, 128-bit integers)
> Copyright (c) 1993 - July 2008 CWI
> Copyright (c) August 2008 - 2018 MonetDB B.V., all rights reserved
> Visit https://www.monetdb.org/ for further information
> Found 17.0GiB available memory, 40 available cpu cores
> Libraries:
> libpcre: 8.38 2015-11-23 (compiled with 8.38)
> openssl: OpenSSL 1.0.2g 1 Mar 2016 (compiled with OpenSSL 1.0.2g 1 Mar 2016)
> libxml2: 2.9.3 (compiled with 2.9.3)
> Compiled by: monetdb@MonetDB-0.0 (x86_64-pc-linux-gnu)
> Compilation: gcc -g -O2
> Linking : /usr/bin/ld -m elf_x86_64
>
>
>
>
> And the size of processes is not limited.
>
>
>
>
> To let you reproduce the problem conveniently, I'll provide more details here:
>
> you can get tpc-ds from its website(we use version 2.6.0).
>
> Install the tpc-ds, access the directory v2.6.0/tools and run `./dsdgen -scale
> 10 -dir /home/monetdb/tpc-ds_test_data10G` to generate the data.
>
> When data has been generated, using the script expe.sh to create tables and load
> the data. The query script is 123.tpcds.23.sql.(The syntaxs of other queries
> that tpc-ds generates is not suitable for monetdb all, we don't modify them all
> when the problem occurred).
>
> One more question, I can't get your reply email, so I don't know how to reply
> you, for this case, I could only send a new mail echo time.
>
>
>
>
> Thanks!
>
> Regards,
>
> Rancho
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list

--
| Stefan.Manegold@CWI.nl | DB Architectures   (DA) |
| www.CWI.nl/~manegold/  | Science Park 123 (L321) |
| +31 (0)20 592-4212     | 1098 XG Amsterdam  (NL) |
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list