A good strategy for loading bulk data into monetdb

Knezevic Nikola nikkne at gmx.ch
Mon Apr 25 13:22:21 CEST 2016


Hi Martin,

if I try using "LOCKED", I'm greeted with:

$ mysql -C -N -B -u reader -pxxx some_schema -h mhost -e "SELECT * FROM messages WHERE mDate = '2016-03-30' LIMIT 10;"|mclient -d some_schema -s "COPY INTO \"messages__2016-03-30\" FROM STDIN USING DELIMITERS '\t','\n' LOCKED" -
COPY INTO: insufficient privileges: COPY INTO from .. LOCKED requires database administrator rights


Ok, that means that my user doesn't have db admin rights. Since I couldn't find what is the admin role in the documentation, StackOverflow had the following:
http://stackoverflow.com/questions/32165339/grant-privileges-to-users-on-monetdb

So, I tried:
[root at wks038 ~]# mclient -d some_schema -u monetdb
password:
Welcome to mclient, the MonetDB/SQL interactive terminal (Jul2015-SP4)
Database: MonetDB v11.21.19 (Jul2015-SP4), 'mapi:monetdb://wks038.intra:50000/some_schema'
Type \q to quit, \? for a list of available commands
auto commit mode: on
sql>grant monetdb to writer;
operation successful (2.569ms)
sql>\q
[root at wks038 ~]# mclient -d some_schema -u writer
password:
Welcome to mclient, the MonetDB/SQL interactive terminal (Jul2015-SP4)
Database: MonetDB v11.21.19 (Jul2015-SP4), 'mapi:monetdb://wks038.intra:50000/some_schema'
Type \q to quit, \? for a list of available commands
auto commit mode: on
sql>set role monetdb;
Role (monetdb) missing

Then I decided to use the db admin to populate the data, but that didn't go well either:

$ mysql -C -N -B -u reader -pxxx some_schema -h mhost -e "SELECT * FROM messages WHERE mDate = '2016-03-30' LIMIT 10;"|mclient -d some_schema -u monetdb -s "COPY INTO \"messages__2016-03-30\" FROM STDIN USING DELIMITERS '\t','\n' LOCKED" -
COPY INTO: no such table 'messages__2016-03-30'
syntax error, unexpected sqlINT in: "1971276831"


BTW, I thought that LOCKED is only applicable for COPY FROM 'file', not for STDIN.

Any ideas how to resolve this issues?

Thanks,
Nikola


> Gesendet: Montag, 25. April 2016 um 11:54 Uhr
> Von: "Martin Kersten" <martin.kersten at cwi.nl>
> An: users-list at monetdb.org
> Betreff: Re: A good strategy for loading bulk data into monetdb
>
> Hi
> 
> To avoid the logs you should use the LOCKED version.
> see https://www.monetdb.org/Documentation/Manuals/SQLreference/CopyInto
> 
> 
> 
> On 25/04/16 11:45, Knezevic Nikola wrote:
> > Hi all,
> >
> > I'm trying to load large amounts of data into monetdb (migrating from mysql). Since I'm using Jul2015-SP4 release, I can't create local files and I can't use external tools, my approach is as following:
> > For each table, I partition the data based on the day, and then:
> > 10 create one table in monetdb for each day, and a merge table that takes all these tables
> > 20 fetch data from mysql (mysql -B -N -C ...)
> > 30 _pipe_ data to mclient (that is invoked with mclient ... -c "COPY 2000000 RECORDS FROM STDIN INTO ..." -)
> > 40 load data into corresponding table (the right day)
> > 50 sleep some seconds
> > 60 goto 20
> >
> > Each daily table is around 6'000'000 rows, but I have some that have ~150'000'000 rows.
> >
> > However, I noticed that when I'm doing this, I quickly run out of disk space, as monetdb is creating really big logs (in /data/monetdb/some_schema/sql_logs). Size of that folder is usually 600KB, but once the process of migration starts, it goes to several GBs. Not only that, but /data/monetdb/some_schema/bat also grows a lot.
> >
> > The surprising part, and the motivation for this emails, is how well monetdb compresses the data. If I migrate only one day worth of data, and then restart monetdbd (once or twice) and run 'analyze some_schema', sql_logs size drops to normal levels, and I get a lot of reclaimed space (even in bat directory). I tried to put a 'sleep(120)' after each partition and running 'analyze some_schema' after each iteration, but couldn't get the same behaviour (and it is also quite slow). Is there a way to trigger the compression and reduce disk space overhead without restarting mserver, given the above constraints?
> >
> > Thanks,
> > Nikola
> > _______________________________________________
> > users-list mailing list
> > users-list at monetdb.org
> > https://www.monetdb.org/mailman/listinfo/users-list
> >
> 
> _______________________________________________
> users-list mailing list
> users-list at monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
> 


More information about the users-list mailing list