[Monetdb-developers] [Monetdb-checkins] MonetDB/src/gdk gdk_atoms.mx, MonetDB_1-20, 1.134, 1.134.6.1 gdk_posix.mx, MonetDB_1-20, 1.143, 1.143.2.1

Stefan Manegold Stefan.Manegold at cwi.nl
Tue Oct 16 14:09:40 CEST 2007


Peter,

the purpose and/or reason of/for my question is the fact that we cannot
release the binary packages that Sjoerd built yesterday, because the binary
database format has been modified without increasing the GDK version number.
We simply forgot the latter.
Hence, a new Mserver cannot notice in case it is started on an pld database,
and instead of complaining properly, it will just behave strangely, crash
and/or corrup the database. This is not good.
Consequently, Sjoerd needs to re-do the building once again, and we have to
think of how to solve the above problem.
Obviously, one solution is th "simply" increase the GDK version number.
But then, users have to abandon or dump/retore their databases now and with
the next release (GDK-2).
An other alternative would be to undo the string hash function changes in
the release branch, keeping the same binary format for now, and include both
binary format changes in one release.
In particular if the string hash function changes are "only" for performance
reasons, both Sjoerd, Niels, and I would favor the second solution.
After all, we and our users have been able to live with the "old" string
hash function for years. Those that really require the new one can switch
to the development version.

I'm busy with testing --- on a 8GB dual-dual-core 2GHz Opteron running
64-bit FedoraCore 6 and and optimized 64-bit Mserver with 64-bit OIDs,
shredding the 19 GB file read-only took 11 hours, both with the old and the
new hash functions (Mserver grew to 10 GB res, >50 GB virt).
Shredding updateable without your two fixes took only 2 hours, but the
resulting DB is currupt as reported in
[ 1811229 ] [ADT] Adding large document, with update support
http://sourceforge.net/tracker/index.php?func=detail&aid=1811229&group_id=56
.
Shredding updateable with your two fixes is still busy, running for more
than 6 hours, now...

Stefan


On Tue, Oct 16, 2007 at 01:46:47PM +0200, Peter Boncz wrote:
> Hi,
> 
> Hm, I cannot really understand the purpose of the question. And what is
> wrong with performance fixes?
> 
> both fixes are related to the same bug:
> - the remap failing is addressed by the gdk_posix fix
> - the shredding in the bug report taking excessively long is addressed by
> the gdk_atoms fix
> 
> indeed, any hash function can have collisions.. it all depends on the
> distribution.
> 
> Peter
> 
> PS most probably, this mail (sent from my home account) will be rejected by
> the  sourceforge mailing list -- and I cannot sent through CWI from home as
> the secure mail sending is not supported by CWI staff for microsoft
> emailers.
> 
> 
> -----Original Message-----
> From: Stefan Manegold [mailto:Stefan.Manegold at cwi.nl]
> Sent: dinsdag 16 oktober 2007 12:01
> To: Peter Boncz
> Cc: monetdb-developers at lists.sourceforge.net
> Subject: Re: [Monetdb-checkins] MonetDB/src/gdk gdk_atoms.mx,
> MonetDB_1-20, 1.134, 1.134.6.1 gdk_posix.mx, MonetDB_1-20, 1.143,
> 1.143.2.1
> 
> 
> Peter,
> 
> which part of your changes do fix the problem with updatedable shredding of
> large XML documents as reporten in
> [ 1811229 ] [ADT] Adding large document, with update support
> http://sourceforge.net/tracker/index.php?func=detail&aid=1811229&group_id=56
> 967&atid=482468
> ?
> 
> The new has string function in gdk_atoms.mx or the file descriptor fixes in
> gdk_posix.mx?
> 
> The former looks for like a performance fix to me --- too many collisions
> should only slows the system down, but not copromize its
> fucntionallity/correctness, right?
> Also with the new string has functions ("too") many collisions can still
> occur with certain datasets ...
> 
> Stefan
> 
> 
> On Sun, Oct 14, 2007 at 08:31:36PM +0000, Stefan Manegold wrote:
> > Update of /cvsroot/monetdb/MonetDB/src/gdk
> > In directory sc8-pr-cvs16.sourceforge.net:/tmp/cvs-serv15103
> >
> > Modified Files:
> >       Tag: MonetDB_1-20
> > 	gdk_atoms.mx gdk_posix.mx
> > Log Message:
> >
> > [checkin on behalf of Peter]
> >
> > fixing XQuery bug
> > [ 1811229 ] [ADT] Adding large document, with update support
> >
> http://sourceforge.net/tracker/index.php?func=detail&aid=1811229&group_id=56
> 967&atid=482468
> >
> > gdk_atoms.mx:
> > - hash collisions in strings that consists of digits only (a common case!)
> >   we now use a fast derivative of the Bob Jenkins function from now on
> >
> >   Really bad collisions, in case of the 20GB document of the bug report,
> >   shredding took 8 hours before, 1 hour after this change.
> >
> >   NOTE: this change affects the binary format (string heaps) and all
> product
> >         families, as the hash function is a compiled-in macro!
> >         In particular, lookup operations and joins on SQL (Monet4/5)
> columns
> >         consisting of digits only, but stored in a VARCHAR, should be
> faster
> >         after this check-in.
> >
> > gdk_posix.mx
> > - we lost track of the file descriptor for large heaps (the file desc is
> given
> >   to the mmap-monitoring-thread to close later), such that the remap
> function
> >   could fail (when it was given the illegal file descriptor 0)
> >
> >   NOTE: this change only affects xquery it only uses remap()
> >
> >
> > Index: gdk_posix.mx
> > ===================================================================
> > RCS file: /cvsroot/monetdb/MonetDB/src/gdk/gdk_posix.mx,v
> > retrieving revision 1.143
> > retrieving revision 1.143.2.1
> > diff -u -d -r1.143 -r1.143.2.1
> > --- gdk_posix.mx	4 Sep 2007 17:55:20 -0000	1.143
> > +++ gdk_posix.mx	14 Oct 2007 20:31:33 -0000	1.143.2.1
> > @@ -615,7 +615,7 @@
> >  		MT_mmap_tab[i].writable = writable;
> >  		MT_mmap_tab[i].fd = fd;
> >  		MT_mmap_tab[i].pincnt = 0;
> > -		fd = -1;
> > +		fd = -fd;
> >  	}
> >  	(void) pthread_mutex_unlock(&MT_mmap_lock);
> >  	return fd;
> > @@ -1051,9 +1051,7 @@
> >  	}
> >  	if (ret != (void *) -1L) {
> >  		hdl->fixed = ret;
> > -		fd = MT_mmap_new(path, ret, len, fd, (mode & MMAP_WRITABLE));
> > -		if (fd <= 0)
> > -			hdl->hdl = (void *) 0;	/* MT_mmap_new keeps the fd */
> > +		hdl->hdl = (void*) (ssize_t) MT_mmap_new(path, ret, len, fd, (mode &
> MMAP_WRITABLE));
> >  	}
> >  	return ret;
> >  }
> > @@ -1061,13 +1059,12 @@
> >  void *
> >  MT_mmap_remap(MT_mmap_hdl *hdl, off_t off, size_t len)
> >  {
> > -	void *ret;
> > -
> > -	ret = mmap(hdl->fixed,
> > +        int fd = (int) (ssize_t) hdl->hdl;
> > +	void *ret = mmap(hdl->fixed,
> >  		   len,
> >  		   ((hdl->mode & MMAP_WRITABLE) ? PROT_WRITE : 0) | PROT_READ,
> >  		   ((hdl->mode & MMAP_COPY) ? (MAP_PRIVATE | MAP_NORESERVE) :
> MAP_SHARED) | (hdl->fixed ? MAP_FIXED : 0),
> > -		   (int) (ssize_t) hdl->hdl,
> > +                   (fd < 0)?-fd:fd,
> >  		   off);
> >
> >  	if (ret != (void *) -1L) {
> > @@ -1083,9 +1080,7 @@
> >  MT_mmap_close(MT_mmap_hdl *hdl)
> >  {
> >  	int fd = (int) (ssize_t) hdl->hdl;
> > -
> > -	if (fd)
> > -		close(fd);
> > +	if (fd > 0) close(fd);
> >  	hdl->hdl = NULL;
> >  }
> >
> >
> > Index: gdk_atoms.mx
> > ===================================================================
> > RCS file: /cvsroot/monetdb/MonetDB/src/gdk/gdk_atoms.mx,v
> > retrieving revision 1.134
> > retrieving revision 1.134.6.1
> > diff -u -d -r1.134 -r1.134.6.1
> > --- gdk_atoms.mx	2 May 2007 16:16:58 -0000	1.134
> > +++ gdk_atoms.mx	14 Oct 2007 20:31:32 -0000	1.134.6.1
> > @@ -1878,13 +1878,19 @@
> >  rotates all characters together. It is optimized to process 2 characters
> >  at a time (adding 16-bits to the hash value each iteration).
> >  @h
> > -#define GDK_STRHASH(x,y) {                                             \
> > -        str _c = (str) (x);                                            \
> > -        for((y)=0; _c[0] && _c[1]; _c+=2) {                            \
> > -                 (y) = ((y) << 3) ^ ((y) >> 11) ^ ((y) >> 17) ^ (_c[1] <<
> 8) ^ _c[0];\
> > -        }                                                              \
> > -        (y) ^= _c[0];                                                  \
> > +#define GDK_STRHASH(x,y) {\
> > +     str _key = (str) (x);\
> > +     int _i;\
> > +     for (_i = y = 0; _key[_i]; _i++) {\
> > +         y += _key[_i];\
> > +         y += (y << 10);\
> > +         y ^= (y >> 6);\
> > +     }\
> > +     y += (y << 3);\
> > +     y ^= (y >> 11);\
> > +     y += (y << 15);\
> >  }
> > +
> >  @c
> >  hash_t
> >  strHash(str s)
> >
> >
> > -------------------------------------------------------------------------
> > This SF.net email is sponsored by: Splunk Inc.
> > Still grepping through log files to find problems?  Stop.
> > Now Search log events and configuration files using AJAX and a browser.
> > Download your FREE copy of Splunk now >> http://get.splunk.com/
> > _______________________________________________
> > Monetdb-checkins mailing list
> > Monetdb-checkins at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/monetdb-checkins
> 
> --
> | Dr. Stefan Manegold | mailto:Stefan.Manegold at cwi.nl |
> | CWI,  P.O.Box 94079 | http://www.cwi.nl/~manegold/  |
> | 1090 GB Amsterdam   | Tel.: +31 (20) 592-4212       |
> | The Netherlands     | Fax : +31 (20) 592-4312       |

-- 
| Dr. Stefan Manegold | mailto:Stefan.Manegold at cwi.nl |
| CWI,  P.O.Box 94079 | http://www.cwi.nl/~manegold/  |
| 1090 GB Amsterdam   | Tel.: +31 (20) 592-4212       |
| The Netherlands     | Fax : +31 (20) 592-4312       |




More information about the developers-list mailing list