Memory FootprintMemory Footprint mk Mon, 05/04/2020 - 14:10
MonetDB requires all data that needs to be active at any given point in time to fit into the address space --- and of course to fit on the storage device, (i.e., your disk system). On 32-bit systems, the address space is at most 32-bit (4 GB); in practice, it is actually limited to 3 GB or even 2 GB on most systems. On 64-bit systems, the address space can theoretically be 64-bit, but in practice is often "limited" to 48-bit or so --- not that that makes any difference ...
MonetDB excessively uses main memory for processing, but does not require that all data fit in the available physical memory. To handle a dataset that exceeds the available physical memory, MonetDB does not (only) rely on the available swap space, but (also) uses memory-mapped files to exploit disk storage beyond the swap space as virtual memory.
For example, while bulk-loading data (preferably using a COPY INTO statement from a (possibly compressed) CSV file), MonetDB needs to have all columns of the table that is currently being loaded "active", i.e., accessible in the address space. However, during loading, parts of the data are continuously written to the persistent files on disk, i.e., the whole table does not have to fit into main memory. E.g., loading a 100 GB table works fine on a system with 8 GB RAM and 16 GB swap -- provided there is sufficient free disk space.
During query processing, MonetDB requires for each single MAL operation during the query execution that all its inputs, its outputs, and possible temporary data structures fit in the address space. This won't be a problem on 64-bit machines, but may be experienced if you are close to the limit on 32-bit machines. MonetDB automatically resorts to virtual memory and memory-mapped files for large intermediate results. Also (large) persistent tables are accessed using memory mapping.
While running, you might see your mserver5 process' virtual size grow well beyond the available physical memory and possibly also well beyond your swap space. In principle, this is not a problem at all. Most of this virtual size is due to the fact that large base tables (or intermediate results) that reside as files on disk are memory-mapped into the address space. Those parts of the data that are currently not accessed do not consume any physical memory (except possibly for caching purposes).
However, if individual columns of your table(s) and/or individual columns of intermediate results exceed the size of the available physical memory, the performance of MonetDB might (will) decrease due to increased I/O requirements.