Failing to load a table

Stefan Manegold Stefan.Manegold at cwi.nl
Thu Jul 4 15:12:39 CEST 2013


On Thu, Jul 04, 2013 at 02:41:38PM +0200, Franck Routier wrote:
> Hi all,
> 
> I am trying to copy a table from a postgresql database into monetdb,
> but it fails, with monetdb filling the disks.
> I am using latest version of Monetdb (Feb2013-SP3) on Ubuntu Linux,
> with 32 GB Ram and SSD disks.
> 
> To copy into Monetdb, I have a java program that runs mclient as:
> 
> mclient --language=sql --databas=database --host=host --user=user
> /tmp/command.ctl
> 
> with command.ctl :
> 
> COPY nnn RECORDS INTO table FROM STDIN USING DELIMITERS'|', '\n',
> '\' NULL AS ''
> 
> Then the Postgresql resultset is read from jdbc, formatted and
> written into STDIN.
> So far so good, it work with my dimensions tables (small), and with
> a 295 millions rows fact table that has 109 columns.
> This fact table takes 86 GB on the disk for Postgresql storage (plus
> indices).
> 
> I have another form of the fact table, with less rows (184 millions)
> but more columns (505).
> While the first form of the table has a measureId column and only
> one value column, the other form of the table has one value column
> per measure.
> (It is a "sum(case when measureId=X then value else null ... group
> by" version of the first table).
> 
> Loading this "columnar" table fails, with monetdb filling the disks
> (something like 500 GB was left before the loading process) and
> quitting (not sure what happens exactly here on the server, but I
> get a broken pipe error in java, and the monetdb database is lost).

Please check what the monetdbd / merovingian log file says.

> The table size on disk for Postgresql is 37 GB.
> Using VectorWise, the table loads, and take only 7 GB on disks, but
> the loading process (very similar to what we do with MonetDB) takes
> a very long time( 25 hours), much longer than what happens with the
> 109 columns tables.
> 
> I am trying to make sense of that and understand why this table is
> giving a (very) hard time to monetdb (and somehow VectorWise as
> well).
> Does anyone have an explanation ?

Can you share the schemas (column types) of both tables?
For each type used the number of column with that type might be sufficient.
If there are no string columns at all, that would be sufficient info for us
to estimate the expected storage size.

E.g., if all columns were 4 byt integers, your first fact table would take
295 M * 109 * 4 ~= 128 GB, while your second table would take
184 M * 505 * 4 ~= 372 GB (all withou constraints & indexes).

Other than duplicate elimination (dictionary encoding) for string columns,
MonetDB does not yet use compressed storage.  We do have good experiences
with compressed file systems, though.

Do you have constraints and/or keys defined on your tables prior to bulk
loading?
If so, try bulk loading without constraints/keys defined, and add them
afterwards via alter table statements.

Best,
Stefan

> thanks,
> Franck
> 
> 



> _______________________________________________
> users-list mailing list
> users-list at monetdb.org
> http://mail.monetdb.org/mailman/listinfo/users-list


-- 
| Stefan.Manegold at CWI.nl | DB Architectures   (DA) |
|  www.CWI.nl/~manegold  | Science Park 123 (L321) |
|   +31 (0)20 592-4212   | 1098 XG Amsterdam  (NL) |



More information about the users-list mailing list