A good strategy for loading bulk data into monetdb

Martin Kersten martin.kersten at cwi.nl
Mon Apr 25 11:54:03 CEST 2016


Hi

To avoid the logs you should use the LOCKED version.
see https://www.monetdb.org/Documentation/Manuals/SQLreference/CopyInto



On 25/04/16 11:45, Knezevic Nikola wrote:
> Hi all,
>
> I'm trying to load large amounts of data into monetdb (migrating from mysql). Since I'm using Jul2015-SP4 release, I can't create local files and I can't use external tools, my approach is as following:
> For each table, I partition the data based on the day, and then:
> 10 create one table in monetdb for each day, and a merge table that takes all these tables
> 20 fetch data from mysql (mysql -B -N -C ...)
> 30 _pipe_ data to mclient (that is invoked with mclient ... -c "COPY 2000000 RECORDS FROM STDIN INTO ..." -)
> 40 load data into corresponding table (the right day)
> 50 sleep some seconds
> 60 goto 20
>
> Each daily table is around 6'000'000 rows, but I have some that have ~150'000'000 rows.
>
> However, I noticed that when I'm doing this, I quickly run out of disk space, as monetdb is creating really big logs (in /data/monetdb/some_schema/sql_logs). Size of that folder is usually 600KB, but once the process of migration starts, it goes to several GBs. Not only that, but /data/monetdb/some_schema/bat also grows a lot.
>
> The surprising part, and the motivation for this emails, is how well monetdb compresses the data. If I migrate only one day worth of data, and then restart monetdbd (once or twice) and run 'analyze some_schema', sql_logs size drops to normal levels, and I get a lot of reclaimed space (even in bat directory). I tried to put a 'sleep(120)' after each partition and running 'analyze some_schema' after each iteration, but couldn't get the same behaviour (and it is also quite slow). Is there a way to trigger the compression and reduce disk space overhead without restarting mserver, given the above constraints?
>
> Thanks,
> Nikola
> _______________________________________________
> users-list mailing list
> users-list at monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
>



More information about the users-list mailing list