Disk space

Disk space mk Wed, 03/20/2013 - 22:44

The disk space footprint is determined by the way columns are being stored. MonetDB uses dictionary encoding for string columns, but aside from this there is no default compression applied to reduce the disk footprint. The prime reason being the two-headed sword of compression. It saves (cheap) disk space and IO bandwidth at the cost of expensive CPU (de)compression overhead (See compression). Since all columns are memory mapped upon access, i.e. they need not be decompressed.  If disk space comes at a premium and memory residency can be guaranteed for a long time, then a compressed file system, e.g. BTRFS, may become helpful. Its compression behavior is often as good as  dedicated algorithms executed within the critical path of a query execution.

The disk footprint can be assessed using the (Linux) command 'du' on the dbfarm directory or to run the query 'select * from storage();', provided the sql extensions are pre-loaded into your database. (See storage model)

Running out of diskspace

One of the features of MonetDB's execution model is that all intermediates are materialized as memory mapped files. A consequence of this approach is that when memory is too small to keep them around, they will be swapped to disk by the operating system. This can be seen as a decaying free space and ultimately a full disk. In turn this (should) lead to a single transaction abort and removing its disk claim. Evidently, filling your disk depends on the number of users and the complexity of their queries. It aids to the cost of running COPY INTO and queries concurrently.

If you have limited resources for concurrent access then the monetdb funnel may be an option to serialize the user requests.  A single server can have multiple funnels.