[Monetdb-developers] Monet crashes with {sum} and {avg}

Agustin Schapira schapira at cs.umass.edu
Wed Aug 22 16:13:23 CEST 2007


I am having more difficulties with {sum} and {avg}. The problems  
happen both on 4.16.2 and 4.18.0, on Linux and OS X, compiled with  
32- and 64- bits, with and without optimizations.

I have been able to isolate the problem and write a simple test case  
that shows the error. The attached file has the contents of a BAT; if  
you save it and then run the following script:

var b:=bat(str,int).import(<filename>);
var x:={sum}(b);

then you will get the following error:

*** glibc detected *** double free or corruption (!prev):  
0x00000000006a6360 ***

and Monet will crash.

The failing free is called from line 1819 of MonetDB4/src/monet/ 
monet_multiplex.mx. It's inside the 'interpret_setaggr' function, and  
it happens only with the 'non-optimized hash' implementation of  
setaggr starting on line 1694 (in fact, if you sort the BAT before  
calling {sum}, there will be no error. The same is true for example  
when you call {count}, which uses a different implementation of  
setaggr to process the {}).

The failing free appears within the 'bunins_failed' label (line  
1800), which in turn is called by the loop that processes the  
aggregate when, I think, its call to 'bunins_unary' fails. Here's the  
failing code, starting on line 1694:

	BATloopFast(extent, p, q, yy) {
		if (BATprepareHash(b))
			goto bunins_failed;
		HASHloop(b, b->hhash, hh, last) {
			r = BUNptr(b, hh);

I don't understand why @:bunins_unary(r,hh)@ should fail. I have  
noticed, however, that if I remove the last couple of lines from the  
attached file then the call to {sum} doesn't crash. These last two  
lines are exactly the same as others, so it's not their particular  
value that matters, but rather the number of rows in the BAT with the  
"Other" head value.

Do you have any suggestions? Any other information that I can provide  
to help you fix the problem?

Thanks a lot again,

-- Agustin

PS: On OS X, instead of the crash you get an error message "malloc:  
***  Deallocation of a pointer not malloced: 0x281ee00; This could be  
a double free(), or free() called with the middle of an allocated  
block", but for all practical purposes it's the same: the free() is  
called from the same place in interpret_setaggr. The difference is  
just in the way the Mac equivalent of glibc handles the double free.

