[Monetdb-developers] [MonetDB-users] SQL server startup time problem (undo/redo problem?)

James Laken james.laken at gmail.com
Tue Jun 19 12:13:15 CEST 2007

Dear Martin,

> Can you specify the system you are working on, e.g. hardware/os.

The hardware is a dual-core 2.2GHz AMD64, 4G RAM, 8 disk raid0 (256k 
chunk size) with XFS filesystem for the database. The OS is a 64bit 
Debian Etch.

> You are really beating the system. I don't know a system that
> can handle a TPC-H SF100 out of the box. This size often
> require a multistage process and careful setting of the system
> parameters.
 > But, let's see what you have done and learn from your experience.

I didn't expect sub millisecond response times for the tpc-h queries out 
of box, so I am open for any system or database setting modification.

I have noticed that the MonetDB initial memory usage is slightly lower 
when I changed the kernel parameter vm/overcommit_memory from 2 to 0.

> Did you stop and restart the server between the loads? If not, then
> from a recovery point of view all 420M are stored in a single log file
> and become the target of a single reload. It behaves as if you loaded
> the 7x60M as a single batch.

I didn't try to restart the server on every slice load - so that was the 
problem. Is there any way to control the checkpoint process? Stopping 
and restarting the server is well ... not the most elegant way to 
trigger a checkpoint.

> Killing a database process is of course hard. In that case, the
> recovery process has to reload the data and enters a really expensive
> part of TPC-H: ensure correctness of the integrity relationships.
> Protection against this is hard, because it requires that integrity
> rules enforcement should either be disabled (the method persued in MySQL).

Is there any formula that can predict the recovery time after killing 
the database process based on the log sizes or other parameters?

It would be good to have a bulk data load method without integrity rules 
enforcement - similar to pg_bulkload [1] in postgres. Using the same 
database schema as in MonetDB I have loaded the whole TPC-H dataset to 
postgres within 90 minutes (~19M/sec, which is still pretty slow imho).

>> to restart the server process but after three hours of intensive 
>> processing the sql module still not started. Please note that the 
>> initialization process allocated nearly all memory and swap.
> This is what we expect. Your tables require a lot of space, because
> MonetDB does not automatically partitioning it. (That's scheduled
> for an upcoming release ;-))

Hmm, looks pretty interesting. Any plan for reasonably sized (eg.: 64M) 
table slice partitioning across several machines? I am trying to build a 
scalable log analysis framework (append only, read mostly data flow) so 
this feature would help a lot.


[1] http://pgbulkload.projects.postgresql.org/

More information about the developers-list mailing list