Thank you, Stefan.  This is extremely helpful.  In fact, this is the best description on how MonetDB uses memory that I have run across.

During a previous test with the same data, I did lift the virtual memory restriction on mserver5 and it did succeed in loading the data.  However, the downside is that it seemed to end up consuming all of the physical memory on the server (a Solaris SPARC with 16 CPUs, 64 GB of RAM and 100 GB of swap).  So I needed a way to ensure other applications still have memory.  I had previously asked a question on how to limit the amount of memory used by MonetDB in Sourceforge Tracker (probably the wrong forum) without getting any response.  Thus my use of ulimit.  Another way may be to use Solaris zone or a virtual machine to contain MonetDB.

>From what I understand from your description, in theory I can run MonetDB on a 64-bit server with combined RAM and swap space of a few *gigabytes*, and still succeed at loading *terabytes* of data *and* performing *arbitrarily* complex queries subsequently.  The only cost would be in performance.  I'd be willing to pay for that performance cost if that is indeed the case.

I like what I have seen in MonetDB and would love to use it if (a) my understanding expressed in the previous paragraph is correct, and (b) I can find out how to limit the amount of RAM and swap that mserver5 uses without limiting its virtual address space.

Thanks again.

On Fri, Mar 19, 2010 at 7:59 AM, Stefan Manegold <Stefan.Manegold@cwi.nl> wrote:
Hering,

MonetDB requires all data that needs to be active at any given point in time
to fit into the address space --- and of course to fit on the storage
device, i.e., your disk system).
On 32-bit systems, the address space is at most 32-bit, i.e., at most 4 GB;
in practice, it is actually limited to 3 GB or even 2 GB on most systems.
On 64-bit systems, the address space can theoritcally be 64-bit, but in
pactice is often "limited" to 48-bit or so --- not that that makes any
difference ...

MonetDB excessively uses main memory for processing, but does not require
that all data fit in the available physical memory. To handle dataset that
exceed the available physical memory, MonetDB does not (only) rely on the
available swap space, but (also) uses memory-mapped files to exploit disk
storage beyond the swap space as virtual memory.

For example, while bulk-loading data (preferably via a COPY INTO statements
from a (possibly compressed) CSV file), MonetDB need to have all column of
the table that is currently being loaded "active", i.e., accessable in the
address space. However, during loading, parts of the data are continuously
written to the persisten files on disk, i.e., the whole table does not have
to fit into main memory. E.g., loading a 100 GB table works fine on a system
with 8 GB RAM and 16 GB swap -- provided there is sufficient free disk
space.

During query processing, MonetDB requires for each operation during the
query execution that all its inputs, its outputs, and possible temporary
data structure fit in the address space. MonetDB automatically resorts to
virtual memory and memory mapped files for large intermedate results.
Also (large) persitent tables are accessed using memory mapping.

While running, you might see your mserver5 process' virtual size grow well
beyond the available physical memory and possibly also well beyond your swap
space.  In principle, this is no problem at all.  Most of this virtual size
is due to the fact that large base tables (or intermediate results) that
rised as files on disk are memory-mapped into the address space. However,
those parts of the data that are currently not accessed do not consume any
physical memory (except possible for caching purposes).

However, if individual columns of your table(s) and/or indivdual columns of
intermediate results exceed the size of the available physical memory, the
performance of MonetDB might (will) decrease due to increased I/O
requirements.

In your case, you indeed limit the virtual size (including memory-mapped
files) of your mserver5 to 32 GB, and hence cannot load your 115 GB table.

Did you also try without limiting the virtual size?
Did that work?

If that does not work, either, could you please report in detail what
happends (errors? crashes?) and share some details about your data, in
particular the schema of your table (number and types of columns; number of
records to be loaded; whether there are any constraints
(primary-/foreign-keys, etc.) defined on the table, etc.) --- knowing your
"create table" and "copy into" statements would help us a lot, then.

Hope this clearifies some things and helps you further.

Don't hesitate to ask, in case not, or in case you have more questions.

Kind regards,
Stefan

On Fri, Mar 19, 2010 at 07:18:21AM -0700, Hering Cheng wrote:
> Hi,
>
> I am trying to understand the limitations of MonetDB with regard to memory.
> It seems that the amount of virtual memory (RAM plus swap space) available
> to MonetDB must be greater than the amount of data.  Before I give up on
> MonetDB, can someone please confirm or correct my understanding?  I did not
> understand the virtual memory requirement when I read the following in its
> documentation (
> http://monetdb.cwi.nl/MonetDB/Documentation/Size-Limitations.html#Size-Limitations
> ):
>
> *The maximal database size supported by MonetBD depends on the underlying
> processing platform, e.g., a 32- or 64-bit processor, and storage device,
> e.g., the file system and disk raids.
> *
>
> For example, I started up merovingian (using the Feb 2010 release), limiting
> it and its child processes to 32 GB of virtual memory (via ulimit -v).  I
> then try to load 115 GB of data into a single table.  mclient reports the
> following:
>
> MAPI  = chenher@myserver50000
> COPY 472112651 RECORDS INTO mytable FROM 'mydata.txt' DELIMITERS '|', '\n';
> ERROR = !SQLException:importTable:failed to import table
>         !ERROR: HEAPextend: failed to extend to 1888450604 for 17/1734tail
>         !ERROR: TABLETcreate_bats: Failed to create bat of size 472112651
>
> I understand if MonetDB must be able to fit all its data within virtual
> memory.  But if this is the case, then unfortunately I cannot use it, since
> I have terabytes of data and I do not have that much virtual memory on my
> hardware.
>
> Thank you.
> Hering

> ------------------------------------------------------------------------------
> Download Intel&#174; Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev

> _______________________________________________
> MonetDB-users mailing list
> MonetDB-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/monetdb-users


--
| Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl |
| CWI,  P.O.Box 94079 | http://www.cwi.nl/~manegold/  |
| 1090 GB Amsterdam   | Tel.: +31 (20) 592-4212       |
| The Netherlands     | Fax : +31 (20) 592-4199       |