[Monetdb-developers] [MonetDB-users] SQL server startup time problem (undo/redo problem?)

Martin Kersten Martin.Kersten at cwi.nl
Wed Jun 20 20:46:12 CEST 2007


Dear James,

thanks for the info and suggestions.

James Laken wrote:
> Dear Martin,
> 
>> Can you specify the system you are working on, e.g. hardware/os.
> 
> The hardware is a dual-core 2.2GHz AMD64, 4G RAM, 8 disk raid0 (256k 
> chunk size) with XFS filesystem for the database. The OS is a 64bit 
> Debian Etch.
> 
>> You are really beating the system. I don't know a system that
>> can handle a TPC-H SF100 out of the box. This size often
>> require a multistage process and careful setting of the system
>> parameters.
>  > But, let's see what you have done and learn from your experience.
> 
> I didn't expect sub millisecond response times for the tpc-h queries out 
> of box, so I am open for any system or database setting modification.
> 
> I have noticed that the MonetDB initial memory usage is slightly lower 
> when I changed the kernel parameter vm/overcommit_memory from 2 to 0.
Wow, you went deep. And useful information.
> 
>> Did you stop and restart the server between the loads? If not, then
>> from a recovery point of view all 420M are stored in a single log file
>> and become the target of a single reload. It behaves as if you loaded
>> the 7x60M as a single batch.
> 
> I didn't try to restart the server on every slice load - so that was the 
> problem. Is there any way to control the checkpoint process? Stopping 
> and restarting the server is well ... not the most elegant way to 
> trigger a checkpoint.
This is indeed a useful feature to add. You can add it to the corresponding
list. We probably need it internally shortly as well, pushing it up our
development priority list.
> 
>> Killing a database process is of course hard. In that case, the
>> recovery process has to reload the data and enters a really expensive
>> part of TPC-H: ensure correctness of the integrity relationships.
>> Protection against this is hard, because it requires that integrity
>> rules enforcement should either be disabled (the method persued in 
>> MySQL).
> 
> Is there any formula that can predict the recovery time after killing 
> the database process based on the log sizes or other parameters?
We haven't studied this area so far. The way it is done after a
COPY is to put the backup BATs in place at practialy no cost. What delays
the situation is that upon recovery you always have to check
integrity. We are aware that this is overly expensive in some cases
and a bug report has been put in place for attention.
> 
> It would be good to have a bulk data load method without integrity rules 
> enforcement - similar to pg_bulkload [1] in postgres. Using the same 
> database schema as in MonetDB I have loaded the whole TPC-H dataset to 
> postgres within 90 minutes (~19M/sec, which is still pretty slow imho).
Indeed, in sql/src/benchmarks/tpch you'll notice different load scripts
with/without integrity enforcement. A separate load functionality
is relatively easy to construct if you know the mapping of SQL table
to underlying BATs. BUT, it somehow violates the basics assumptions
on what the system should do.
> 
>>> to restart the server process but after three hours of intensive 
>>> processing the sql module still not started. Please note that the 
>>> initialization process allocated nearly all memory and swap.
>> This is what we expect. Your tables require a lot of space, because
>> MonetDB does not automatically partitioning it. (That's scheduled
>> for an upcoming release ;-))
> 
> Hmm, looks pretty interesting. Any plan for reasonably sized (eg.: 64M) 
> table slice partitioning across several machines? I am trying to build a 
> scalable log analysis framework (append only, read mostly data flow) so 
> this feature would help a lot.
We are aiming for adaptive segmentation based on query load.
See e.g. the papers on Crackers.
> 
> Regards,
> J.
> 
> [1] http://pgbulkload.projects.postgresql.org/





More information about the developers-list mailing list