[MonetDB-users] ERROR GDKload failed....CMDthetajoin; operation failed
Hello again kind people.
This is an sql self join on TableA which contains a serial primary key column called 'seq'. The select is limited to less than 400 rows of output by 'a1.seq<=400'.
sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where a1.seq<=400 and a1.seq>=17 and a1.seq<a2.seq+17 group by a1.seq, a1.num order by a1.seq asc;
This select statement results in 384 rows, as expected, with a newly calculated column for sum(a2.num)/17 which yields a running average of 'num' between any given value for 'seq' and 'seq+17'.
This next sql statement is exactly the same as above except the limitation of 384 rows of output imposed by 'a1.seq<=400' has been removed. Note that TableA contained 40 million plus rows.
sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where and a1.seq>=17 and a1.seq<a2.seq+17 group by a1.seq, a1.num order by a1.seq asc;
Error: ERROR: GDKload failed: name=14/1457, ext=buns.priv ERROR: CMDthetajoin: operation failed.
Perhaps there is a better way to accomplish a running aggregate on 40 million rows?
Thanks.........George
I assume that the first query was executed successfully in a separate transaction. The second query builds a huge intermediate result incrementally.
To understand such queries you might try a toy example first (e.g. a table of 10 elements and look at the intermediate results)
Consider a1.seq = 18 then there are 40M-18 combinations for which a1.seq <a2.seq+17, this is stored in the intermediate... and then add a1.seq= 19 .... roughly leading to a table of 40M^2
You probably mean a2.seq <= a1.seq <a2.seq+17, which should be translated by the SQL front-end in a bandjoin(). I am almost sure it doesnot.
The SQL standard contains WINDOWS functions to handle this case. They are not supported yet.
A hack would be to generate 18 statements, which accumulate the result along the lines: insert into accum(first,last,total) select seq,seq,num from tableA; update accum set last=last +1 set total = total + (select num from tableA where seq= last+1); ...17 times... ...sort/group...
regards, Martin George H wrote:
Hello again kind people.
This is an sql self join on TableA which contains a serial primary key column called 'seq'. The select is limited to less than 400 rows of output by 'a1.seq<=400'.
sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where a1.seq<=400 and a1.seq>=17 and a1.seq<a2.seq+17 group by a1.seq, a1.num order by a1.seq asc;
This select statement results in 384 rows, as expected, with a newly calculated column for sum(a2.num)/17 which yields a running average of 'num' between any given value for 'seq' and 'seq+17'.
This next sql statement is exactly the same as above except the limitation of 384 rows of output imposed by 'a1.seq<=400' has been removed. Note that TableA contained 40 million plus rows.
sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where and a1.seq>=17 and a1.seq<a2.seq+17 group by a1.seq, a1.num order by a1.seq asc;
Error: ERROR: GDKload failed: name=14/1457, ext=buns.priv ERROR: CMDthetajoin: operation failed.
Perhaps there is a better way to accomplish a running aggregate on 40 million rows?
Thanks.........George
Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=D... _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
On Tue, Oct 10, 2006 at 11:27:31AM -0700, George H wrote:
Hello again kind people.
This is an sql self join on TableA which contains a serial primary key column called 'seq'. The select is limited to less than 400 rows of output by 'a1.seq<=400'.
sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where a1.seq<=400 and a1.seq>=17 and a1.seq<a2.seq+17 group by a1.seq, a1.num order by a1.seq asc;
This select statement results in 384 rows, as expected, with a newly calculated column for sum(a2.num)/17 which yields a running average of 'num' between any given value for 'seq' and 'seq+17'.
This next sql statement is exactly the same as above except the limitation of 384 rows of output imposed by 'a1.seq<=400' has been removed. Note that TableA contained 40 million plus rows.
sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where and a1.seq>=17 and a1.seq<a2.seq+17 group by a1.seq, a1.num order by a1.seq asc;
I guess you want some limitation on the lower and of the band join here.
ie
select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where a1.seq >= 16 -- your window is 17, ie we need to skip the first 16 and a1.seq >= a2.seq and a1.seq < a2.seq+17 etc
Not sure this will give a solution to the data explosion as our current mapping will have problems with turn both into bat theta joins, which will be close to the crosstable of 40M*40M rows.
This will be an other item on my list to fix...
Niels
Error: ERROR: GDKload failed: name=14/1457, ext=buns.priv ERROR: CMDthetajoin: operation failed.
Perhaps there is a better way to accomplish a running aggregate on 40 million rows?
Thanks.........George
Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=D... _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Sorry Niels
I did not correctly report the statement I used. I used the limitation on the lower end of the band just as you wrote it with 'a1.seq>=a2.seq':
select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where a1.seq >= 16 -- your window is 17, ie we need to skip the first 16 and a1.seq >= a2.seq and a1.seq < a2.seq+17 etc
result is same error, same data explosion.
Thanks for your comment though
Best...George
Niels Nes wrote:
On Tue, Oct 10, 2006 at 11:27:31AM -0700, George H wrote:
Hello again kind people.
This is an sql self join on TableA which contains a serial primary key column called 'seq'. The select is limited to less than 400 rows of output by 'a1.seq<=400'.
sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where a1.seq<=400 and a1.seq>=17 and a1.seq<a2.seq+17 group by a1.seq, a1.num order by a1.seq asc;
This select statement results in 384 rows, as expected, with a newly calculated column for sum(a2.num)/17 which yields a running average of 'num' between any given value for 'seq' and 'seq+17'.
This next sql statement is exactly the same as above except the limitation of 384 rows of output imposed by 'a1.seq<=400' has been removed. Note that TableA contained 40 million plus rows.
sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where and a1.seq>=17 and a1.seq<a2.seq+17 group by a1.seq, a1.num order by a1.seq asc;
I guess you want some limitation on the lower and of the band join here.
ie
select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where a1.seq >= 16 -- your window is 17, ie we need to skip the first 16 and a1.seq >= a2.seq and a1.seq < a2.seq+17 etc
Not sure this will give a solution to the data explosion as our current mapping will have problems with turn both into bat theta joins, which will be close to the crosstable of 40M*40M rows.
This will be an other item on my list to fix...
Niels
Error: ERROR: GDKload failed: name=14/1457, ext=buns.priv ERROR: CMDthetajoin: operation failed.
Perhaps there is a better way to accomplish a running aggregate on 40 million rows?
Thanks.........George
Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=D... _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Sorry:
The statement that generated the error is inccorectly reported in the initial post. The statement was actually:
sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where and a1.seq>=17 and a1.seq>=a2.seq and a1.seq<a2.seq+17 group by a1.seq, a1.num order by a1.seq asc;
Error: ERROR: GDKload failed: name=14/1457, ext=buns.priv ERROR: CMDthetajoin: operation failed.
George
George H wrote:
Hello again kind people.
This is an sql self join on TableA which contains a serial primary key column called 'seq'. The select is limited to less than 400 rows of output by 'a1.seq<=400'.
sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where a1.seq<=400 and a1.seq>=17 and a1.seq<a2.seq+17 group by a1.seq, a1.num order by a1.seq asc;
This select statement results in 384 rows, as expected, with a newly calculated column for sum(a2.num)/17 which yields a running average of 'num' between any given value for 'seq' and 'seq+17'.
This next sql statement is exactly the same as above except the limitation of 384 rows of output imposed by 'a1.seq<=400' has been removed. Note that TableA contained 40 million plus rows.
sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where and a1.seq>=17 and a1.seq<a2.seq+17 group by a1.seq, a1.num order by a1.seq asc;
Error: ERROR: GDKload failed: name=14/1457, ext=buns.priv ERROR: CMDthetajoin: operation failed.
Perhaps there is a better way to accomplish a running aggregate on 40 million rows?
Thanks.........George
participants (3)
-
George H
-
Martin Kersten
-
Niels Nes