MonetDB is well-known for its eagerness to "eat up" all your memory. Fortunately, on Linux, there is a memory resource controller called cgroups, with which one can rein in that behaviour a bit. Let me share some experiences of MonetDB under cgroups here.
Perhaps it is a less known fact that Docker, LXC and other container technologies are implemented using Linux Namespace Isolation and Linux Control Groups, aka cgroups. The latter is a Linux kernel feature introduced in 2008. The main idea behind cgroups is to manage hardware and operating system resources for different groups of processes. Let’s for example say we want to run two database servers on a single host which has 64GB of RAM. If we want each database server to have some guaranteed access to half of this memory capacity, we can set up two special memory cgroups, group_1 and group_2, and put a memory limit of 32 GB on each of them and associate the database server processes with their respective cgroup.
Besides the memory controller there are other subsystems that are useful for database systems (although limiting CPU usage of MonetDB is much easier). For example the cpusets CGroup subsystem is a very useful subsystem to assign specific CPUs to specific processes. This can be used to divide the total number of threads equally among multiple database servers. There are CGroup subsystems that deal with labelling and prioritising network packages, (block) devices, snapshotting, CPU usage accounting, etc.
So why would you choose to use a direct cgroup approach instead of a managed container layer like Docker? Well I can imagine that in enterprise production environments certain, (overly) strict security policies prevent the usage of third party tools outside of the tools that come with the OS distribution and some well defined software inventory. Another reason could be that for simple use cases, manual cgroups configurations offers a more lightweight, transparent and dynamic way of implementing resource management. On the other hand, if you need high customisability, then again cgroups offers the most fine-grained access on Linux systems to resource management.
Setting up a cgroup for MonetDB
To keep it simple, we will only focus on the memory controller subsystem here, and how to get it done on my Ubuntu 18.04 (Bionic) machine which runs Linux kernel from version 4.15. Furthermore I am using cgroups version 1. There is also a more recent version 2 of cgroups which has a different architecture.
Let’s say that our MonetDB user is running mserver5 locally on his/her desktop. And the user wants to prevent MonetDB from eating all of the machine’s 4 GB RAM. So the user decides to limit mserver5 to cap physical memory usage to 2 GB.
My Ubuntu machine already comes with a couple of cgroup hierarchies for specific tasks. These hierarchies are just a set of directory trees. You can find the special file system called “cgroups” mounted on the root of each of these trees. The hierarchy that we are interested in is found at /sys/fs/cgroup/memory which is the preconfigured root of a cgroup hierarchy that controls memory limits. we can do something like
cd /sys/fs/cgroup/memory sudo mkdir monetdb-cgroup cd monetdb-cgroup
After the creation of the cgroup, notice that it is already filled with special files:
ls -rw-r--r-- 1 root root 0 Oct 30 11:19 cgroup.clone_children --w--w--w- 1 root root 0 Oct 30 11:19 cgroup.event_control -rw-r--r-- 1 root root 0 Oct 30 11:19 cgroup.procs -rw-r--r-- 1 root root 0 Oct 30 11:19 memory.failcnt --w------- 1 root root 0 Oct 30 11:19 memory.force_empty -rw-r--r-- 1 root root 0 Oct 30 11:19 memory.kmem.failcnt -rw-r--r-- 1 root root 0 Oct 30 11:19 memory.kmem.limit_in_bytes -rw-r--r-- 1 root root 0 Oct 30 11:19 memory.kmem.max_usage_in_bytes -r--r--r-- 1 root root 0 Oct 30 11:19 memory.kmem.slabinfo -rw-r--r-- 1 root root 0 Oct 30 11:19 memory.kmem.tcp.failcnt -rw-r--r-- 1 root root 0 Oct 30 11:19 memory.kmem.tcp.limit_in_bytes -rw-r--r-- 1 root root 0 Oct 30 11:19 memory.kmem.tcp.max_usage_in_bytes -r--r--r-- 1 root root 0 Oct 30 11:19 memory.kmem.tcp.usage_in_bytes -r--r--r-- 1 root root 0 Oct 30 11:19 memory.kmem.usage_in_bytes -rw-r--r-- 1 root root 0 Oct 30 11:19 memory.limit_in_bytes -rw-r--r-- 1 root root 0 Oct 30 11:19 memory.max_usage_in_bytes -rw-r--r-- 1 root root 0 Oct 30 11:19 memory.move_charge_at_immigrate -r--r--r-- 1 root root 0 Oct 30 11:19 memory.numa_stat -rw-r--r-- 1 root root 0 Oct 30 11:19 memory.oom_control ---------- 1 root root 0 Oct 30 11:19 memory.pressure_level -rw-r--r-- 1 root root 0 Oct 30 11:19 memory.soft_limit_in_bytes -r--r--r-- 1 root root 0 Oct 30 11:19 memory.stat -rw-r--r-- 1 root root 0 Oct 30 11:19 memory.swappiness -r--r--r-- 1 root root 0 Oct 30 11:19 memory.usage_in_bytes -rw-r--r-- 1 root root 0 Oct 30 11:19 memory.use_hierarchy -rw-r--r-- 1 root root 0 Oct 30 11:19 notify_on_release -rw-r--r-- 1 root root 0 Oct 30 11:19 tasks
To put a limit on the memory usage, perform
sudo $(echo 2G > memory.limit_in_bytes)
Assuming that a single monetdb instance was already running, we can then simply add the corresponding mserver5 process to our newly created monetdb-cgroup:
sudo $(pgrep mserver5 > tasks)
Voila! The mserver5 process’ private physical memory usage is now bound by above commands to 2 GB. Even if mserver5 was using more than 2 GB of memory at the time of its cgroup association, all incoming allocations are then swapped out until the process is using less than its cgroup physical memory limit. This even works for sharable memory: as long as only a single process uses the potentially sharable memory pages (e.g. pages associated with memory mapped files), those pages are considered by the kernel to be private and charged to the process that uses them (the most). You can easily see the difference in memory usage using something like the top command.
I have to put up a disclaimer that this is the desired behaviour according to the official Linux kernel documentation and it is produced on the aforementioned Ubuntu system. But before this successful implementation, I have done a similar experiment with a slightly older Ubuntu distribution (16.04 xenial with kernel version 4.4; I believe). On that set up, the cgroup limit didn’t correctly limit the memory usage of MonetDB, because somehow it wasn’t accounting the sharable memory to the private memory of the database server process. I have no clue why this is the case and I am suspecting a bug of some sort in the older kernel.
In case your system doesn’t have a predefined cgroup hierarchy for configuring the memory controller, the steps to create a cgroup hierarchy are pretty straightforward and outlined in the above mentioned kernel documentation.
I have noticed that the directory that contains the roots of the cgroup hierarchies, always has the tmpfs file system mounted. I haven’t found a reason why this is the case or if it also works with another/inherited file system
sudo mkdir /sys/fs/cgroup # Doesn’t necessarily have to be this path sudo mount -t tmpfs none /sys/fs/cgroup cd /sys/fs/cgroup sudo mount -t cgroup none /sys/fs/cgroup/memory -o memory
To configure cgroups, you need root privileges. Either because you need to use the mount command or because predefined cgroups like /sys/fs/cgroup/memory are owned by root and have write-restrictions. However, after you have configured a (parent) cgroup you can change the ownership of that cgroup to a non-root user using chown.
Be aware that with cgroups version1, only one cgroup hierarchy can exist for any unique set of cgroups subsystems. So if we would create another hierarchy, say /sys/fs/cgroup/another-memory, that has only the memory controller as a subsystem (-o memory), then the corresponding mount command for another-memory will fail, since there is already an existing hierarchy with the exact same subsystem(s). But if we would mount it with two subsystems, say the memory AND cpuset controllers (-o cpuset,memory), the mount should succeed.
CGroups and OOM-killer
Some times when MonetDB hits the (memory) limit of its CGroup, it might get killed by the OOM-killer. Triggering the OOM-killer seems to be one strategy used by CGroups to enforce the memory limitation on the processes (see also  and ). To prevent MonetDB from being killed by the OOM-killer, one can set "oom_score_adj" to a very low value, e.g. -1000. Valid values of "oom_score_adj" are integers in the range of -1000 to 1000. The lower the value is, the lower the chance is that a process is going to be killed (see also ).