Hi Vojtech,

I don't have a magic recipe, but I can tell you what helped and what not, in my case.

I didn't notice any difference by playing with overcommit settings.

About OOM: it is never a good idea to try and avoid it. If the process is faulty and should not use that much memory, better kill it than endanger the system. If the process is not faulty, then it means that it does need more memory, no way around that.
I also don't think oom_score_adj is useful. Most likely MonetDB is the most memory-hungry process in your system, so no point in trying to kill a different process.

mem_reservation does indeed translate to gdk_mem_maxsize. MonetDB should try to adjust itself around this value, but it can very well go beyond. In your case it seems to go very much beyond. I don't know your workload, but either a bug is using more memory than needed (are you on a recent version), or it does indeed need a lot of memory. What the memory is really used for is not always obvious. I had twin servers in which OOM occurred only in the one with more memory available, which made me think that somehow the memory used by the OS as buffer memory was counted in. But I never really got to the bottom of that.

mem_limit: this has nothing to do with MonetDB (when used in combination with mem_reservation) and it simply is your OOM limit. I set it ridiculously high in my services, just to be sure it will be eventually killed if it really goes out of control. Try to increase this. How far can you go? Does MonetDB take even more memory?

I'm afraid I can't tell you much more. I didn't really solve all my issues, but I kept most under control with a combination of mem_reservation and mem_limit.

On Mon, 21 Jun 2021 at 18:36, Vojtech Kurka <v@meiro.io> wrote:
Hello Roberto, Sjoerd,

I think I am hitting the same problem described in this thread.  I am running MonetDB in Docker container on a server with 30GB of memory, allocating 21GB of it to MonetDB container. 

Sometimes, under heavier load I get OOM - process mserver5 uses more memory than is the memory.limit_in_bytes for a cgroup of this container. It looks like this:

[Mon Jun 21 12:56:36 2021] memory: usage 20971484kB, limit 20971520kB, failcnt 48706810
[Mon Jun 21 12:56:36 2021] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
[Mon Jun 21 12:56:36 2021] kmem: usage 143696kB, limit 9007199254740988kB, failcnt 0
...
[Mon Jun 21 12:56:36 2021] Tasks state (memory values in pages):
[Mon Jun 21 12:56:36 2021] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[Mon Jun 21 12:56:36 2021] [  26927]     0 26927     1371      151    53248        0             0 run.sh
[Mon Jun 21 12:56:36 2021] [  27054]     0 27054   134763      326   139264        0             0 monetdbd
[Mon Jun 21 12:56:36 2021] [  27068]     0 27068 40646510  5209845 64159744        0             0 mserver5
[Mon Jun 21 12:56:36 2021] [  27289]     0 27289     1019       69    45056        0             0 tail
[Mon Jun 21 12:56:36 2021] [  29503]     0 29503     6425      547    86016        0             0 mclient
[Mon Jun 21 12:56:36 2021] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=d37591a31d4d010a7b4e34ebb19ba11220a84a264733f70a00ee92899c7918fc,mems_allowed=0,oom_memcg=/docker/d37591a31d4d010a7b4e34ebb19ba11220a84a264733f70a00ee92899c7918fc,task_memcg=/docker/d37591a31d4d010a7b4e34ebb19ba11220a84a264733f70a00ee92899c7918fc,task=mserver5,pid=27068,uid=0
[Mon Jun 21 12:56:36 2021] Memory cgroup out of memory: Killed process 27068 (mserver5) total-vm:162586040kB, anon-rss:20823812kB, file-rss:15568kB, shmem-rss:0kB, UID:0 pgtables:62656kB oom_score_adj:0

Have you made any progress in resolving the OOMs? 

Things I am trying to test are (I am using docker-compose v2 for running the container, so settings are from there):

mem_limit: 21G # this sets memory.limit_in_bytes for the cgroup of the container
mem_reservation: 14G # this sets memory.soft_limit_in_bytes
oom_score_adj: -1000 # to lower the chance OOM will kill this process

Exploring "select * from env();" in MonetDB tells me that the gdk_mem_maxsize is 12251394211, which corresponds to the 81.5% of the mem_reservation, or memory.soft_limit_in_bytes. 

I am monitoring memory usage of the container and it still is spiking at 19G, which is uncomfortably high, very close to the hard limit of the cgroup. Is this normal? 

FYI, originally meant only for PostgreSQL, but it became our standard - we are using vm.overcommit = 2 and vm.overcommit_ration = 100, if it changes anything. 

I would be thankful for any clue on how to run MonetDB in the container reliably as right now it is causing me quite a headache. Running it on a host solely for the MonetDB would be a last resort for me. 

Thank you and have a great day!
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list