[Monetdb-developers] Performance of COPY INTO: data source & constraint checking

Sjoerd Mullender sjoerd at acm.org
Fri Jul 24 14:30:40 CEST 2009


Roberto Cornacchia wrote:
> I come back to this issue with some more interesting results.
> 
> I did mention in the past that 'COPY INTO ... FROM stdin ..' seems to be
> a lot slower than 'COPY INTO ... FROM <filename> ...'
> 
> Actually, I have just found something that may shift the problem away
> from the stdin itself.
> 
> Look at the following two methods of restoring a previously dumped
> (rather small) database. 
> 
> RESTORE METHOD 1 (far too slow):
> 
> $ mclient < backup.sql
> [ 62	]
> [ 70199	]
> [ 288	]
> [ 70322	]
> [ 69	]
> [ 423	]
> [ 280596	]
> [ 4302	]
> [ 49038	]
> [ 1	]
> [ 10	]
> [ 6943	]
> [ 6943	]
> [ 3424	]
> [ 3424	]
> [ 3424	]
> [ 1	]
> [ 6943	]
> [ 3424	]
> TIME elapsed: 68 seconds
> 
> 
> RESTORE METHOD 2:
> 
> $ echo "\< backup.sql" | mclient
> [ 62	]
> [ 70199	]
> [ 288	]
> [ 70322	]
> [ 69	]
> [ 423	]
> [ 280596	]
> [ 4302	]
> [ 49038	]
> [ 1	]
> [ 10	]
> [ 6943	]
> [ 6943	]
> [ 3424	]
> [ 3424	]
> [ 3424	]
> [ 1	]
> [ 6943	]
> [ 3424	]
> TIME elapsed: 1 seconds
> 
> Can anyone explain this difference?

When mclient reads from standard input, it reads a line at a time.  When
it reads from file (either using the \< construct or as a command line
argument (i.e. without <)) it reads in large chunks.

> One further consideration: Method 2 works fine, but cannot be used to
> load gzipped dumps. Maybe this can be a feature request: '\<' should
> accept gzipped input.

Patches are gratefully accepted.  ;-)

> Roberto
> 
> 
> On Mon, 2009-05-18 at 13:03 +0200, Stefan Manegold wrote:
>> On Mon, May 18, 2009 at 11:57:57AM +0200, Stefan de Konink wrote:
>>> On Mon, 18 May 2009, Stefan Manegold wrote:
>>>
>>>> In case you came to your conclusion from experimental/emperical study, I
>>>> would be very interested to know about the experiemntes performed and the
>>>> results measured.
>>>> In that case, I would be verythankful if you could share the informative
>>>> details with us, i.e. detailed descriptions of
>>>> - experimental setup (HW, SW, datasets, workload/queries)
>>>> - measured results (what was measured how, and what are the results?)
>>>> - (your) discussion/analysis/interpretation of the results
>>> I thought the MADAM meeting was for this.
>> Ok. fine.
>>
>> ... we should then revive the ideas of keeping note of MADAMs (e.g., on a
>> Wiki or alike) as reference for those who cannot be present ...
>>
>>>> Could you please elaborate on whether you consider you setup/schema
>>>> "representaive" and/or whether/to which extend you think that your results
>>>> for one specific case could be generalized to other cases?
>>> Technically the only thing I have to prove to get my code reverted is that
>>> there are cases where late contraints hurt performance.
>> IMHO, our task should be the following:
>>
>> - analyze whether there is indeed a significat performance difference
>>   between early and late constraint checking;
>>   (5% do not justify any further time to be spent on this)
>> - and if so, analyze under which circumstances which altertative performs
>>   (significatly) better or worse than the other;
>> - and (only) then try to find out where the (significat) performance
>>   difference comes from
>> - and (only) then 
>>   + either try to eliminate the difference;
>>   + or 
>>     * make dump/restore use the most suitable alternative (as far as
>>       possible with reasonable effort);
>>     * and document our experiences to give adivce for users
>>
>> Stefan
>>
>>> Stefan
>>>


-- 
Sjoerd Mullender

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 369 bytes
Desc: OpenPGP digital signature
URL: <http://www.monetdb.org/pipermail/developers-list/attachments/20090724/468566c2/attachment.sig>


More information about the developers-list mailing list