how to speed up the loading of data into a table

Stefan Manegold Stefan.Manegold at cwi.nl
Thu Mar 28 20:26:20 CET 2013


On Thu, Mar 28, 2013 at 06:41:08PM +0000, Bouslah Ahmed wrote:
> Good afternoon,
> 
> I have 2 questions :
> - 1st question 
> I have a csv file with 300000000 million lines.
                         ^^^^^^^^^^^^^^^^^
Is that   300000000         aka. 300 million aka. 3*10^8
or indeed 300000000 million aka. 300 billion aka. 3*10^14 ?

> when I load it in a MonetDB table, it takes more than 16 hours, whereas in Oracle it only takes 15 minutes. Where is the problem?
> I use mclient as follows :?
> mclient -u user1 -d demo -ims -s " COPY 300000009 RECORDS INTO LINEORDER FROM STDIN USING DELIMITERS '|','\n'" - < D:\SSB\LINEORDER.tbl

Is the client running on the same machine as the server?

Did/could you try my earlier suggestion of loading directly from file on the
server side (requires to use default admin user "monetdb"), as opposed to
from STDIN, i.e., passing all data through mclient?

> ?for information I use :
> 
> MonetDB5

Which release? Feb2013-SP1?

> Windows
> Memory : 16 Go

How many CPU cores / HW threads?

> There are two constraints : primary key and foreign key on the table prior to loading data?

Did/could you try creating the table without constraints, load the data, and
then add the constraints via ALTER TABLE statements.
Measure the time for both loading the data and adding the constraints
separately.

With MonetDB Feb2013-SP1, on an 8-core 16 GB Linux machine, we load the lineorder table
of SSBM SF-100 (~600 million rows) in ~30 minutes, and create (and validate)
the keys in ~60 minutes).

> 2nd question :
> how inner joins are made in MongoDB?
                              ^^^^^^^
I have no idea. Please ask the MongoDB people.

In MonetDB, we usually use hash joins.

> cordialy.

Stefan

-- 
| Stefan.Manegold at CWI.nl | DB Architectures   (DA) |
|  www.CWI.nl/~manegold  | Science Park 123 (L321) |
|   +31 (0)20 592-4212   | 1098 XG Amsterdam  (NL) |



More information about the developers-list mailing list