Monetdb copy binary time varys very much!

Stefan Manegold Stefan.Manegold at cwi.nl
Fri Jul 26 17:10:50 CEST 2013


Hi,

I'm not sure whether I understand correct what you are doing.

If you repeat the test 1000 times, does that mean that (1) 10000 times you
re-create (or empty) the table and thus always copy into an empty table, or
(2) 10000 times you copy into the same (growing) table, i.e., resulting in a
table of 10,000 times 200,000 rows, i.e., 2,000,000,000 rows, i.e., ~16 GB
per column, i.e., ~336 GB in total?

(Only) in case (1) the binary files to be imported are simply moved at zero
costs.
In case (2), only the first copy into (into the empty table) can simply move
the files at zero costs; all subsequent copy into (into a no longe empty
table) must copy the files (and delete them afterwards to mimic the same
behavior as the initial copy into), which is of cause not "for free".

Also, as Martin explained, unless your machine has (significantly) more RAM
than the ~336 GB of data you copy, the data needs to be written to disk in
between, making some copy into's "slower" than others. There's not much to
do about that other than (a) getting more RAM, or (b) improving I/O
bandwidth by using either a high performance RAID or SSDs.

Stefan

On Fri, Jul 26, 2013 at 04:40:37PM +0800, integrity wrote:
> Hi dear all,
> 
> 
> I did a test of insert 2,000,000,000 rows into MonetDB with "COPY BINARY INTO FROM binary_file",
> in this test
> 1.  i generate 21 files, each file represents a table column and has 200000 rows.
> I created them with rand() number and use fwrite() binary write method.
> 
> 
> the table creation sql command is :
> create table tmatch(id bigint,a float,b float,c float, d float, e float,
> f float,g float,h float, i float,j float,
> k float,l float,m float, n float,o float,
> p float,q float,r float, s float,t float);
> 
> the table has 21 columns,each column has 8 bytes, so each column file is c1=200000*21*8 Byte= 268800000 Byte=3.2MB
> 2. I use "COPY BINARY INTO FROM above_binary" to load each binary file into tmatch.
> The test was run 10000 times repeatedly.
> 
> 
> the average time of 10000 times is only 1.0635589558955727,
> but when at 9043th time,  it cost 227m36.136 ,some times later the time value increase to a large number,  is it because of flush data from cache into database after the cache is full?
> The problem is since we have to control the total process time within 15 seconds , I am wondering if you can help me reduce the maximum time to a lower point?
> 
> 
> 
> 
> Thanks very much!
> Meng

> _______________________________________________
> users-list mailing list
> users-list at monetdb.org
> http://mail.monetdb.org/mailman/listinfo/users-list


-- 
| Stefan.Manegold at CWI.nl | DB Architectures   (DA) |
|  www.CWI.nl/~manegold  | Science Park 123 (L321) |
|   +31 (0)20 592-4212   | 1098 XG Amsterdam  (NL) |



More information about the users-list mailing list