Thank you, Stefan.  You have single-handedly restored my fondness in MonetDB.  Now I will resume my evaluation.

On Fri, Mar 19, 2010 at 9:01 AM, Stefan Manegold <Stefan.Manegold@cwi.nl> wrote:
On Fri, Mar 19, 2010 at 08:27:34AM -0700, Hering Cheng wrote:
> Thank you, Stefan.  This is extremely helpful.  In fact, this is the best
> description on how MonetDB uses memory that I have run across.
>
> During a previous test with the same data, I did lift the virtual memory
> restriction on mserver5 and it did succeed in loading the data.  However,
> the downside is that it seemed to end up consuming all of the physical
> memory on the server (a Solaris SPARC with 16 CPUs, 64 GB of RAM and 100 GB

Well, yes. For large/huge databases, we (MonetDB) assume a dedicated
database server, and hence feel free to indeed use the available memory to
acheive best performance ;-)

> of swap).  So I needed a way to ensure other applications still have
> memory.  I had previously asked a question on how to limit the amount of
> memory used by MonetDB in Sourceforge Tracker (probably the wrong forum)
> without getting any response.  Thus my use of ulimit.  Another way may be to
> use Solaris zone or a virtual machine to contain MonetDB.

In that case, you might want to limit the process' memory resident data
segment, only, not it's total virtual memory footprint (incl. memory mapped
files), i.e., use `ulimit -d` instead of `ulimit -v`:

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
^^^^^^^^^^^^^                        ^^^^^^^^^
[...]

$ mserver5 --version
MonetDB server v5.18.2 (64-bit), based on kernel v1.36.2 (64-bit oids)
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2010 MonetDB B.V., all rights reserved
Visit http://monetdb.cwi.nl/ for further information
Found 7.8GiB available memory, 4 available cpu cores
     ^^^^^^
[...]

$ ulimit -d $[1024*1024]

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) 1048576
^^^^^^^^^^^^^                        ^^^^^^^
[...]

$ mserver5 --version
MonetDB server v5.18.2 (64-bit), based on kernel v1.36.2 (64-bit oids)
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2010 MonetDB B.V., all rights reserved
Visit http://monetdb.cwi.nl/ for further information
Found 1.0GiB available memory, 4 available cpu cores
     ^^^^^^

> >From what I understand from your description, in theory I can run MonetDB on
> a 64-bit server with combined RAM and swap space of a few *gigabytes*, and
> still succeed at loading *terabytes* of data *and* performing *arbitrarily*
> complex queries subsequently.  The only cost would be in performance.  I'd
> be willing to pay for that performance cost if that is indeed the case.

In theory, yes ... (In both theory and practice, you'll need a disk systems
that has sufficient capacity for your "terabytes".)

Note though, that the performance penalty for columns that exceeding
physical memory depends (also) on the performance of your I/O system and the
efficiency of your OS's mmap implementation.

> I like what I have seen in MonetDB and would love to use it if (a) my
> understanding expressed in the previous paragraph is correct, and (b) I can
> find out how to limit the amount of RAM and swap that mserver5 uses without
> limiting its virtual address space.

See above. We (MonetDB) try to avoid using swap as much as possible by
allocating all chucks larger than 128 MB from memory mapped files.  In case
MonetDB still uses "too much" swap in your case, please report the detail
and we might find a way to tune the swap usage of MonetDB.

Stefan

> Thanks again.
>
> On Fri, Mar 19, 2010 at 7:59 AM, Stefan Manegold <Stefan.Manegold@cwi.nl>wrote:
>
> > Hering,
> >
> > MonetDB requires all data that needs to be active at any given point in
> > time
> > to fit into the address space --- and of course to fit on the storage
> > device, i.e., your disk system).
> > On 32-bit systems, the address space is at most 32-bit, i.e., at most 4 GB;
> > in practice, it is actually limited to 3 GB or even 2 GB on most systems.
> > On 64-bit systems, the address space can theoritcally be 64-bit, but in
> > pactice is often "limited" to 48-bit or so --- not that that makes any
> > difference ...
> >
> > MonetDB excessively uses main memory for processing, but does not require
> > that all data fit in the available physical memory. To handle dataset that
> > exceed the available physical memory, MonetDB does not (only) rely on the
> > available swap space, but (also) uses memory-mapped files to exploit disk
> > storage beyond the swap space as virtual memory.
> >
> > For example, while bulk-loading data (preferably via a COPY INTO statements
> > from a (possibly compressed) CSV file), MonetDB need to have all column of
> > the table that is currently being loaded "active", i.e., accessable in the
> > address space. However, during loading, parts of the data are continuously
> > written to the persisten files on disk, i.e., the whole table does not have
> > to fit into main memory. E.g., loading a 100 GB table works fine on a
> > system
> > with 8 GB RAM and 16 GB swap -- provided there is sufficient free disk
> > space.
> >
> > During query processing, MonetDB requires for each operation during the
> > query execution that all its inputs, its outputs, and possible temporary
> > data structure fit in the address space. MonetDB automatically resorts to
> > virtual memory and memory mapped files for large intermedate results.
> > Also (large) persitent tables are accessed using memory mapping.
> >
> > While running, you might see your mserver5 process' virtual size grow well
> > beyond the available physical memory and possibly also well beyond your
> > swap
> > space.  In principle, this is no problem at all.  Most of this virtual size
> > is due to the fact that large base tables (or intermediate results) that
> > rised as files on disk are memory-mapped into the address space. However,
> > those parts of the data that are currently not accessed do not consume any
> > physical memory (except possible for caching purposes).
> >
> > However, if individual columns of your table(s) and/or indivdual columns of
> > intermediate results exceed the size of the available physical memory, the
> > performance of MonetDB might (will) decrease due to increased I/O
> > requirements.
> >
> > In your case, you indeed limit the virtual size (including memory-mapped
> > files) of your mserver5 to 32 GB, and hence cannot load your 115 GB table.
> >
> > Did you also try without limiting the virtual size?
> > Did that work?
> >
> > If that does not work, either, could you please report in detail what
> > happends (errors? crashes?) and share some details about your data, in
> > particular the schema of your table (number and types of columns; number of
> > records to be loaded; whether there are any constraints
> > (primary-/foreign-keys, etc.) defined on the table, etc.) --- knowing your
> > "create table" and "copy into" statements would help us a lot, then.
> >
> > Hope this clearifies some things and helps you further.
> >
> > Don't hesitate to ask, in case not, or in case you have more questions.
> >
> > Kind regards,
> > Stefan
> >
> > On Fri, Mar 19, 2010 at 07:18:21AM -0700, Hering Cheng wrote:
> > > Hi,
> > >
> > > I am trying to understand the limitations of MonetDB with regard to
> > memory.
> > > It seems that the amount of virtual memory (RAM plus swap space)
> > available
> > > to MonetDB must be greater than the amount of data.  Before I give up on
> > > MonetDB, can someone please confirm or correct my understanding?  I did
> > not
> > > understand the virtual memory requirement when I read the following in
> > its
> > > documentation (
> > >
> > http://monetdb.cwi.nl/MonetDB/Documentation/Size-Limitations.html#Size-Limitations
> > > ):
> > >
> > > *The maximal database size supported by MonetBD depends on the underlying
> > > processing platform, e.g., a 32- or 64-bit processor, and storage device,
> > > e.g., the file system and disk raids.
> > > *
> > >
> > > For example, I started up merovingian (using the Feb 2010 release),
> > limiting
> > > it and its child processes to 32 GB of virtual memory (via ulimit -v).  I
> > > then try to load 115 GB of data into a single table.  mclient reports the
> > > following:
> > >
> > > MAPI  = chenher@myserver50000
> > > COPY 472112651 RECORDS INTO mytable FROM 'mydata.txt' DELIMITERS '|',
> > '\n';
> > > ERROR = !SQLException:importTable:failed to import table
> > >         !ERROR: HEAPextend: failed to extend to 1888450604 for
> > 17/1734tail
> > >         !ERROR: TABLETcreate_bats: Failed to create bat of size 472112651
> > >
> > > I understand if MonetDB must be able to fit all its data within virtual
> > > memory.  But if this is the case, then unfortunately I cannot use it,
> > since
> > > I have terabytes of data and I do not have that much virtual memory on my
> > > hardware.
> > >
> > > Thank you.
> > > Hering
> >
> > >
> > ------------------------------------------------------------------------------
> > > Download Intel&#174; Parallel Studio Eval
> > > Try the new software tools for yourself. Speed compiling, find bugs
> > > proactively, and fine-tune applications for parallel performance.
> > > See why Intel Parallel Studio got high marks during beta.
> > > http://p.sf.net/sfu/intel-sw-dev
> >
> > > _______________________________________________
> > > MonetDB-users mailing list
> > > MonetDB-users@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/monetdb-users
> >
> >
> > --
> > | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl |
> > | CWI,  P.O.Box 94079 | http://www.cwi.nl/~manegold/<http://www.cwi.nl/%7Emanegold/> |
> > | 1090 GB Amsterdam   | Tel.: +31 (20) 592-4212       |
> > | The Netherlands     | Fax : +31 (20) 592-4199       |
> >

--
| Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl |
| CWI,  P.O.Box 94079 | http://www.cwi.nl/~manegold/  |
| 1090 GB Amsterdam   | Tel.: +31 (20) 592-4212       |
| The Netherlands     | Fax : +31 (20) 592-4199       |