[Monetdb-developers] [Monetdb-checkins] MonetDB/src/gdk gdk_atoms.mx, MonetDB_1-20, 1.134, 1.134.6.1 gdk_posix.mx, MonetDB_1-20, 1.143, 1.143.2.1

Martin Kersten Martin.Kersten at cwi.nl
Tue Oct 16 14:28:43 CEST 2007


Peter,

Stefan Manegold wrote:
> Peter,
> 
> the purpose and/or reason of/for my question is the fact that we cannot
> release the binary packages that Sjoerd built yesterday, because the binary
> database format has been modified without increasing the GDK version number.
> We simply forgot the latter.
> Hence, a new Mserver cannot notice in case it is started on an pld database,
> and instead of complaining properly, it will just behave strangely, crash
> and/or corrup the database. This is not good.
> Consequently, Sjoerd needs to re-do the building once again, and we have to
> think of how to solve the above problem.
> Obviously, one solution is th "simply" increase the GDK version number.
> But then, users have to abandon or dump/retore their databases now and with
> the next release (GDK-2).
> An other alternative would be to undo the string hash function changes in
> the release branch, keeping the same binary format for now, and include both
> binary format changes in one release.
> In particular if the string hash function changes are "only" for performance
> reasons, both Sjoerd, Niels, and I would favor the second solution.
The same holds for me and Fabian. I would even go as far as completely
undoing this patch, aside from the absolute minimum necessary to avoid
known and demonstrated bugs (with proper functional tests).


> After all, we and our users have been able to live with the "old" string
> hash function for years. Those that really require the new one can switch
> to the development version.

Yes, a few months ago we have (after strong discussions) decided that
*any* interface issue should be discussed thoroughly upfront.
This involves both internal/external APIs, anything that may severely
complicate live for others. This also was enforced for checking in
features in the stable. In the check-in messages on the stable
it was repeatedly mentioned that any change of this kind was
subject to approval by Sjoerd.

Again, I see too many last minute actions with possibly far
reaching consequences.

I consider a change in storage layout a sufficient discontinuity to
not include this without a few months internal use. Otherwise, we might
have also released the GDK2 code already.

> 
> I'm busy with testing --- on a 8GB dual-dual-core 2GHz Opteron running
> 64-bit FedoraCore 6 and and optimized 64-bit Mserver with 64-bit OIDs,
> shredding the 19 GB file read-only took 11 hours, both with the old and the
> new hash functions (Mserver grew to 10 GB res, >50 GB virt).
Sorry, but this is a performance test and not a functionality test.
As such it can not and should not block a release.

> Shredding updateable without your two fixes took only 2 hours, but the
> resulting DB is currupt as reported in
> [ 1811229 ] [ADT] Adding large document, with update support
> http://sourceforge.net/tracker/index.php?func=detail&aid=1811229&group_id=56
> .
> Shredding updateable with your two fixes is still busy, running for more
> than 6 hours, now...
> 
> Stefan
> 
> 
> On Tue, Oct 16, 2007 at 01:46:47PM +0200, Peter Boncz wrote:
>> Hi,
>>
>> Hm, I cannot really understand the purpose of the question. And what is
>> wrong with performance fixes?
>>
>> both fixes are related to the same bug:
>> - the remap failing is addressed by the gdk_posix fix
>> - the shredding in the bug report taking excessively long is addressed by
>> the gdk_atoms fix
>>
>> indeed, any hash function can have collisions.. it all depends on the
>> distribution.
>>
>> Peter
>>
>> PS most probably, this mail (sent from my home account) will be rejected by
>> the  sourceforge mailing list -- and I cannot sent through CWI from home as
>> the secure mail sending is not supported by CWI staff for microsoft
>> emailers.
>>
>>
>> -----Original Message-----
>> From: Stefan Manegold [mailto:Stefan.Manegold at cwi.nl]
>> Sent: dinsdag 16 oktober 2007 12:01
>> To: Peter Boncz
>> Cc: monetdb-developers at lists.sourceforge.net
>> Subject: Re: [Monetdb-checkins] MonetDB/src/gdk gdk_atoms.mx,
>> MonetDB_1-20, 1.134, 1.134.6.1 gdk_posix.mx, MonetDB_1-20, 1.143,
>> 1.143.2.1
>>
>>
>> Peter,
>>
>> which part of your changes do fix the problem with updatedable shredding of
>> large XML documents as reporten in
>> [ 1811229 ] [ADT] Adding large document, with update support
>> http://sourceforge.net/tracker/index.php?func=detail&aid=1811229&group_id=56
>> 967&atid=482468
>> ?
>>
>> The new has string function in gdk_atoms.mx or the file descriptor fixes in
>> gdk_posix.mx?
>>
>> The former looks for like a performance fix to me --- too many collisions
>> should only slows the system down, but not copromize its
>> fucntionallity/correctness, right?
>> Also with the new string has functions ("too") many collisions can still
>> occur with certain datasets ...
>>
>> Stefan
>>
>>
>> On Sun, Oct 14, 2007 at 08:31:36PM +0000, Stefan Manegold wrote:
>>> Update of /cvsroot/monetdb/MonetDB/src/gdk
>>> In directory sc8-pr-cvs16.sourceforge.net:/tmp/cvs-serv15103
>>>
>>> Modified Files:
>>>       Tag: MonetDB_1-20
>>> 	gdk_atoms.mx gdk_posix.mx
>>> Log Message:
>>>
>>> [checkin on behalf of Peter]
>>>
>>> fixing XQuery bug
>>> [ 1811229 ] [ADT] Adding large document, with update support
>>>
>> http://sourceforge.net/tracker/index.php?func=detail&aid=1811229&group_id=56
>> 967&atid=482468
>>> gdk_atoms.mx:
>>> - hash collisions in strings that consists of digits only (a common case!)
>>>   we now use a fast derivative of the Bob Jenkins function from now on
>>>
>>>   Really bad collisions, in case of the 20GB document of the bug report,
>>>   shredding took 8 hours before, 1 hour after this change.
>>>
>>>   NOTE: this change affects the binary format (string heaps) and all
>> product
>>>         families, as the hash function is a compiled-in macro!
>>>         In particular, lookup operations and joins on SQL (Monet4/5)
>> columns
>>>         consisting of digits only, but stored in a VARCHAR, should be
>> faster
>>>         after this check-in.
>>>
>>> gdk_posix.mx
>>> - we lost track of the file descriptor for large heaps (the file desc is
>> given
>>>   to the mmap-monitoring-thread to close later), such that the remap
>> function
>>>   could fail (when it was given the illegal file descriptor 0)
>>>
>>>   NOTE: this change only affects xquery it only uses remap()
>>>
>>>
>>> Index: gdk_posix.mx
>>> ===================================================================
>>> RCS file: /cvsroot/monetdb/MonetDB/src/gdk/gdk_posix.mx,v
>>> retrieving revision 1.143
>>> retrieving revision 1.143.2.1
>>> diff -u -d -r1.143 -r1.143.2.1
>>> --- gdk_posix.mx	4 Sep 2007 17:55:20 -0000	1.143
>>> +++ gdk_posix.mx	14 Oct 2007 20:31:33 -0000	1.143.2.1
>>> @@ -615,7 +615,7 @@
>>>  		MT_mmap_tab[i].writable = writable;
>>>  		MT_mmap_tab[i].fd = fd;
>>>  		MT_mmap_tab[i].pincnt = 0;
>>> -		fd = -1;
>>> +		fd = -fd;
>>>  	}
>>>  	(void) pthread_mutex_unlock(&MT_mmap_lock);
>>>  	return fd;
>>> @@ -1051,9 +1051,7 @@
>>>  	}
>>>  	if (ret != (void *) -1L) {
>>>  		hdl->fixed = ret;
>>> -		fd = MT_mmap_new(path, ret, len, fd, (mode & MMAP_WRITABLE));
>>> -		if (fd <= 0)
>>> -			hdl->hdl = (void *) 0;	/* MT_mmap_new keeps the fd */
>>> +		hdl->hdl = (void*) (ssize_t) MT_mmap_new(path, ret, len, fd, (mode &
>> MMAP_WRITABLE));
>>>  	}
>>>  	return ret;
>>>  }
>>> @@ -1061,13 +1059,12 @@
>>>  void *
>>>  MT_mmap_remap(MT_mmap_hdl *hdl, off_t off, size_t len)
>>>  {
>>> -	void *ret;
>>> -
>>> -	ret = mmap(hdl->fixed,
>>> +        int fd = (int) (ssize_t) hdl->hdl;
>>> +	void *ret = mmap(hdl->fixed,
>>>  		   len,
>>>  		   ((hdl->mode & MMAP_WRITABLE) ? PROT_WRITE : 0) | PROT_READ,
>>>  		   ((hdl->mode & MMAP_COPY) ? (MAP_PRIVATE | MAP_NORESERVE) :
>> MAP_SHARED) | (hdl->fixed ? MAP_FIXED : 0),
>>> -		   (int) (ssize_t) hdl->hdl,
>>> +                   (fd < 0)?-fd:fd,
>>>  		   off);
>>>
>>>  	if (ret != (void *) -1L) {
>>> @@ -1083,9 +1080,7 @@
>>>  MT_mmap_close(MT_mmap_hdl *hdl)
>>>  {
>>>  	int fd = (int) (ssize_t) hdl->hdl;
>>> -
>>> -	if (fd)
>>> -		close(fd);
>>> +	if (fd > 0) close(fd);
>>>  	hdl->hdl = NULL;
>>>  }
>>>
>>>
>>> Index: gdk_atoms.mx
>>> ===================================================================
>>> RCS file: /cvsroot/monetdb/MonetDB/src/gdk/gdk_atoms.mx,v
>>> retrieving revision 1.134
>>> retrieving revision 1.134.6.1
>>> diff -u -d -r1.134 -r1.134.6.1
>>> --- gdk_atoms.mx	2 May 2007 16:16:58 -0000	1.134
>>> +++ gdk_atoms.mx	14 Oct 2007 20:31:32 -0000	1.134.6.1
>>> @@ -1878,13 +1878,19 @@
>>>  rotates all characters together. It is optimized to process 2 characters
>>>  at a time (adding 16-bits to the hash value each iteration).
>>>  @h
>>> -#define GDK_STRHASH(x,y) {                                             \
>>> -        str _c = (str) (x);                                            \
>>> -        for((y)=0; _c[0] && _c[1]; _c+=2) {                            \
>>> -                 (y) = ((y) << 3) ^ ((y) >> 11) ^ ((y) >> 17) ^ (_c[1] <<
>> 8) ^ _c[0];\
>>> -        }                                                              \
>>> -        (y) ^= _c[0];                                                  \
>>> +#define GDK_STRHASH(x,y) {\
>>> +     str _key = (str) (x);\
>>> +     int _i;\
>>> +     for (_i = y = 0; _key[_i]; _i++) {\
>>> +         y += _key[_i];\
>>> +         y += (y << 10);\
>>> +         y ^= (y >> 6);\
>>> +     }\
>>> +     y += (y << 3);\
>>> +     y ^= (y >> 11);\
>>> +     y += (y << 15);\
>>>  }
>>> +
>>>  @c
>>>  hash_t
>>>  strHash(str s)
>>>
>>>
>>> -------------------------------------------------------------------------
>>> This SF.net email is sponsored by: Splunk Inc.
>>> Still grepping through log files to find problems?  Stop.
>>> Now Search log events and configuration files using AJAX and a browser.
>>> Download your FREE copy of Splunk now >> http://get.splunk.com/
>>> _______________________________________________
>>> Monetdb-checkins mailing list
>>> Monetdb-checkins at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/monetdb-checkins
>> --
>> | Dr. Stefan Manegold | mailto:Stefan.Manegold at cwi.nl |
>> | CWI,  P.O.Box 94079 | http://www.cwi.nl/~manegold/  |
>> | 1090 GB Amsterdam   | Tel.: +31 (20) 592-4212       |
>> | The Netherlands     | Fax : +31 (20) 592-4312       |
> 





More information about the developers-list mailing list