[Monetdb-developers] Performance of COPY INTO: data source & constraint checking

Roberto Cornacchia R.Cornacchia at cwi.nl
Fri Jul 24 12:53:55 CEST 2009


I come back to this issue with some more interesting results.

I did mention in the past that 'COPY INTO ... FROM stdin ..' seems to be
a lot slower than 'COPY INTO ... FROM <filename> ...'

Actually, I have just found something that may shift the problem away
from the stdin itself.

Look at the following two methods of restoring a previously dumped
(rather small) database. 

RESTORE METHOD 1 (far too slow):

$ mclient < backup.sql
[ 62	]
[ 70199	]
[ 288	]
[ 70322	]
[ 69	]
[ 423	]
[ 280596	]
[ 4302	]
[ 49038	]
[ 1	]
[ 10	]
[ 6943	]
[ 6943	]
[ 3424	]
[ 3424	]
[ 3424	]
[ 1	]
[ 6943	]
[ 3424	]
TIME elapsed: 68 seconds


RESTORE METHOD 2:

$ echo "\< backup.sql" | mclient
[ 62	]
[ 70199	]
[ 288	]
[ 70322	]
[ 69	]
[ 423	]
[ 280596	]
[ 4302	]
[ 49038	]
[ 1	]
[ 10	]
[ 6943	]
[ 6943	]
[ 3424	]
[ 3424	]
[ 3424	]
[ 1	]
[ 6943	]
[ 3424	]
TIME elapsed: 1 seconds

Can anyone explain this difference?

One further consideration: Method 2 works fine, but cannot be used to
load gzipped dumps. Maybe this can be a feature request: '\<' should
accept gzipped input.

Roberto


On Mon, 2009-05-18 at 13:03 +0200, Stefan Manegold wrote:
> On Mon, May 18, 2009 at 11:57:57AM +0200, Stefan de Konink wrote:
> > On Mon, 18 May 2009, Stefan Manegold wrote:
> > 
> > > In case you came to your conclusion from experimental/emperical study, I
> > > would be very interested to know about the experiemntes performed and the
> > > results measured.
> > > In that case, I would be verythankful if you could share the informative
> > > details with us, i.e. detailed descriptions of
> > > - experimental setup (HW, SW, datasets, workload/queries)
> > > - measured results (what was measured how, and what are the results?)
> > > - (your) discussion/analysis/interpretation of the results
> > 
> > I thought the MADAM meeting was for this.
> 
> Ok. fine.
> 
> ... we should then revive the ideas of keeping note of MADAMs (e.g., on a
> Wiki or alike) as reference for those who cannot be present ...
> 
> > > Could you please elaborate on whether you consider you setup/schema
> > > "representaive" and/or whether/to which extend you think that your results
> > > for one specific case could be generalized to other cases?
> > 
> > Technically the only thing I have to prove to get my code reverted is that
> > there are cases where late contraints hurt performance.
> 
> IMHO, our task should be the following:
> 
> - analyze whether there is indeed a significat performance difference
>   between early and late constraint checking;
>   (5% do not justify any further time to be spent on this)
> - and if so, analyze under which circumstances which altertative performs
>   (significatly) better or worse than the other;
> - and (only) then try to find out where the (significat) performance
>   difference comes from
> - and (only) then 
>   + either try to eliminate the difference;
>   + or 
>     * make dump/restore use the most suitable alternative (as far as
>       possible with reasonable effort);
>     * and document our experiences to give adivce for users
> 
> Stefan
> 
> > 
> > Stefan
> > 
> 
-- 
| M.Sc. Roberto Cornacchia
| CWI (Centrum voor Wiskunde en Informatica)
| Science Park 123, 1098XG Amsterdam, The Netherlands
| tel: +31 20 592 4322 , http://www.cwi.nl/~roberto





More information about the developers-list mailing list