[Monetdb-developers] [Monetdb-checkins] MonetDB/src/gdk gdk_atoms.mx, MonetDB_1-20, 1.134, 1.134.6.1 gdk_posix.mx, MonetDB_1-20, 1.143, 1.143.2.1

Stefan Manegold Stefan.Manegold at cwi.nl
Tue Oct 16 12:01:18 CEST 2007


Peter,

which part of your changes do fix the problem with updatedable shredding of
large XML documents as reporten in 
[ 1811229 ] [ADT] Adding large document, with update support
http://sourceforge.net/tracker/index.php?func=detail&aid=1811229&group_id=56967&atid=482468
?

The new has string function in gdk_atoms.mx or the file descriptor fixes in
gdk_posix.mx?

The former looks for like a performance fix to me --- too many collisions
should only slows the system down, but not copromize its
fucntionallity/correctness, right?
Also with the new string has functions ("too") many collisions can still
occur with certain datasets ...

Stefan


On Sun, Oct 14, 2007 at 08:31:36PM +0000, Stefan Manegold wrote:
> Update of /cvsroot/monetdb/MonetDB/src/gdk
> In directory sc8-pr-cvs16.sourceforge.net:/tmp/cvs-serv15103
> 
> Modified Files:
>       Tag: MonetDB_1-20
> 	gdk_atoms.mx gdk_posix.mx 
> Log Message:
> 
> [checkin on behalf of Peter]
> 
> fixing XQuery bug
> [ 1811229 ] [ADT] Adding large document, with update support
> http://sourceforge.net/tracker/index.php?func=detail&aid=1811229&group_id=56967&atid=482468
> 
> gdk_atoms.mx:
> - hash collisions in strings that consists of digits only (a common case!)
>   we now use a fast derivative of the Bob Jenkins function from now on
> 
>   Really bad collisions, in case of the 20GB document of the bug report,
>   shredding took 8 hours before, 1 hour after this change.
> 
>   NOTE: this change affects the binary format (string heaps) and all product
>         families, as the hash function is a compiled-in macro!
>         In particular, lookup operations and joins on SQL (Monet4/5) columns
>         consisting of digits only, but stored in a VARCHAR, should be faster
>         after this check-in.
> 
> gdk_posix.mx
> - we lost track of the file descriptor for large heaps (the file desc is given
>   to the mmap-monitoring-thread to close later), such that the remap function
>   could fail (when it was given the illegal file descriptor 0)
> 
>   NOTE: this change only affects xquery it only uses remap()
> 
> 
> Index: gdk_posix.mx
> ===================================================================
> RCS file: /cvsroot/monetdb/MonetDB/src/gdk/gdk_posix.mx,v
> retrieving revision 1.143
> retrieving revision 1.143.2.1
> diff -u -d -r1.143 -r1.143.2.1
> --- gdk_posix.mx	4 Sep 2007 17:55:20 -0000	1.143
> +++ gdk_posix.mx	14 Oct 2007 20:31:33 -0000	1.143.2.1
> @@ -615,7 +615,7 @@
>  		MT_mmap_tab[i].writable = writable;
>  		MT_mmap_tab[i].fd = fd;
>  		MT_mmap_tab[i].pincnt = 0;
> -		fd = -1;
> +		fd = -fd;
>  	}
>  	(void) pthread_mutex_unlock(&MT_mmap_lock);
>  	return fd;
> @@ -1051,9 +1051,7 @@
>  	}
>  	if (ret != (void *) -1L) {
>  		hdl->fixed = ret;
> -		fd = MT_mmap_new(path, ret, len, fd, (mode & MMAP_WRITABLE));
> -		if (fd <= 0)
> -			hdl->hdl = (void *) 0;	/* MT_mmap_new keeps the fd */
> +		hdl->hdl = (void*) (ssize_t) MT_mmap_new(path, ret, len, fd, (mode & MMAP_WRITABLE));
>  	}
>  	return ret;
>  }
> @@ -1061,13 +1059,12 @@
>  void *
>  MT_mmap_remap(MT_mmap_hdl *hdl, off_t off, size_t len)
>  {
> -	void *ret;
> -
> -	ret = mmap(hdl->fixed,
> +        int fd = (int) (ssize_t) hdl->hdl;
> +	void *ret = mmap(hdl->fixed,
>  		   len,
>  		   ((hdl->mode & MMAP_WRITABLE) ? PROT_WRITE : 0) | PROT_READ,
>  		   ((hdl->mode & MMAP_COPY) ? (MAP_PRIVATE | MAP_NORESERVE) : MAP_SHARED) | (hdl->fixed ? MAP_FIXED : 0),
> -		   (int) (ssize_t) hdl->hdl,
> +                   (fd < 0)?-fd:fd,
>  		   off);
>  
>  	if (ret != (void *) -1L) {
> @@ -1083,9 +1080,7 @@
>  MT_mmap_close(MT_mmap_hdl *hdl)
>  {
>  	int fd = (int) (ssize_t) hdl->hdl;
> -
> -	if (fd)
> -		close(fd);
> +	if (fd > 0) close(fd);
>  	hdl->hdl = NULL;
>  }
>  
> 
> Index: gdk_atoms.mx
> ===================================================================
> RCS file: /cvsroot/monetdb/MonetDB/src/gdk/gdk_atoms.mx,v
> retrieving revision 1.134
> retrieving revision 1.134.6.1
> diff -u -d -r1.134 -r1.134.6.1
> --- gdk_atoms.mx	2 May 2007 16:16:58 -0000	1.134
> +++ gdk_atoms.mx	14 Oct 2007 20:31:32 -0000	1.134.6.1
> @@ -1878,13 +1878,19 @@
>  rotates all characters together. It is optimized to process 2 characters
>  at a time (adding 16-bits to the hash value each iteration).
>  @h
> -#define GDK_STRHASH(x,y) {                                             \
> -        str _c = (str) (x);                                            \
> -        for((y)=0; _c[0] && _c[1]; _c+=2) {                            \
> -                 (y) = ((y) << 3) ^ ((y) >> 11) ^ ((y) >> 17) ^ (_c[1] << 8) ^ _c[0];\
> -        }                                                              \
> -        (y) ^= _c[0];                                                  \
> +#define GDK_STRHASH(x,y) {\
> +     str _key = (str) (x);\
> +     int _i;\
> +     for (_i = y = 0; _key[_i]; _i++) {\
> +         y += _key[_i];\
> +         y += (y << 10);\
> +         y ^= (y >> 6);\
> +     }\
> +     y += (y << 3);\
> +     y ^= (y >> 11);\
> +     y += (y << 15);\
>  }
> +
>  @c
>  hash_t
>  strHash(str s)
> 
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> _______________________________________________
> Monetdb-checkins mailing list
> Monetdb-checkins at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/monetdb-checkins

-- 
| Dr. Stefan Manegold | mailto:Stefan.Manegold at cwi.nl |
| CWI,  P.O.Box 94079 | http://www.cwi.nl/~manegold/  |
| 1090 GB Amsterdam   | Tel.: +31 (20) 592-4212       |
| The Netherlands     | Fax : +31 (20) 592-4312       |




More information about the developers-list mailing list