[Monetdb-developers] [Monetdb-checkins] MonetDB/src/gdk gdk_atoms.mx, MonetDB_1-20, 1.134, 1.134.6.1 gdk_posix.mx, MonetDB_1-20, 1.143, 1.143.2.1

Stefan Manegold Stefan.Manegold at cwi.nl
Wed Oct 17 13:54:06 CEST 2007


Just for the records:

I finally managed to finsh my experiments regarding
[ 1811229 ] [ADT] Adding large document, with update support
http://sourceforge.net/tracker/index.php?func=detail&aid=1811229&group_id=56967&atid=482468
and the related code changes. For those interested, here's the detailed
story:


"S08-64" System (beo-24):
- 2x 64-bit Dual-Core Opteron270 @ 2 Ghz
- 8 GB memory
- MonetDB/XQuery 0.20, 64-bit, 64-bit OIDs, --enable-optimize (gcc 4.1.2)

"S16-32" System (core-1):
- 4x 64-bit Dual-Core Opteron870 @ 2 Ghz
- 16 GB memory
- MonetDB/XQuery 0.20, 64-bit, 32-bit OIDs, --enable-optimize (gcc 4.1.2)

Document:
http://mirror.openstreetmap.nl/planet/planet-071003.osm.bz2
(extracted: 19 GB XML file)

"SR" Shredding read-only:
pf:add-doc(".../planet-071003.osm","planet-071003.osm")

"SU" Shredding updateable:
pf:add-doc(".../planet-071003.osm","planet-071003.osm","planet-071003.osm",5)

"QR"/"QU" Count query:
count(doc("planet-071003.osm")//*)

Configurations:
m: without Peter's mmap fix in gdk_posix.mx
   (i.e., using rev. 1.143 of gdk_posix.mx)
M: with Peter's mmap fix in gdk_posix.mx
   (i.e., using rev. 1.143.2.1 of gdk_posix.mx)
h: without Peter's new string hash function in gdk_atoms.mx
   (i.e., using rev. 1.134 of gdk_atoms.mx)
H: with Peter's new string hash function in gdk_atoms.mx
   (i.e., using rev. 1.134.6.1 of gdk_atoms.mx)


Results (wall-clock times):

S08-64:
     SR        QR       SU       QU[1]     QU[2]
mh   659m34s   1m09s    81m06s    ERROR    -
mH   -         -       383m17s    ERROR    -
Mh   -         -        77m03s   344m32s   1m34s
MH   644m01s   0m49s   390m42s   342m47s   1m36s

S16-32:
     SR        QR       SU       QU[1]     QU[2]
mh   127m59s   0m17s    43m14s    ERROR    -
mH   110m33s   0m16s    26m26s    ERROR    -
Mh   128m11s   0m18s    44m00s   100m50s   1m15s
MH   191m42s   0m17s    25m43s   101m37s   1m21s

(NB: "SR" includes building of indices, while "SU" does not;
 consequently, "QR" can exploit the indices built during "SR", while "QU[1]"
 has to build the indices first, and only "QU[2]" can exploit them.)


Apparently, the mmap fix in gdk_posix.mx seems to be sufficient to prevent
the remap-ERROR reported (for a system & configuration similar to "S08-64")
in
[ 1811229 ] [ADT] Adding large document, with update support
http://sourceforge.net/tracker/index.php?func=detail&aid=1811229&group_id=56967&atid=482468

I'll leave the further interpretation of the above results to the interested
recipient / reader.


Stefan


On Tue, Oct 16, 2007 at 01:46:47PM +0200, Peter Boncz wrote:
> Hi,
> 
> Hm, I cannot really understand the purpose of the question. And what is
> wrong with performance fixes?
> 
> both fixes are related to the same bug:
> - the remap failing is addressed by the gdk_posix fix
> - the shredding in the bug report taking excessively long is addressed by
> the gdk_atoms fix
> 
> indeed, any hash function can have collisions.. it all depends on the
> distribution.
> 
> Peter
> 
> PS most probably, this mail (sent from my home account) will be rejected by
> the  sourceforge mailing list -- and I cannot sent through CWI from home as
> the secure mail sending is not supported by CWI staff for microsoft
> emailers.
> 
> 
> -----Original Message-----
> From: Stefan Manegold [mailto:Stefan.Manegold at cwi.nl]
> Sent: dinsdag 16 oktober 2007 12:01
> To: Peter Boncz
> Cc: monetdb-developers at lists.sourceforge.net
> Subject: Re: [Monetdb-checkins] MonetDB/src/gdk gdk_atoms.mx,
> MonetDB_1-20, 1.134, 1.134.6.1 gdk_posix.mx, MonetDB_1-20, 1.143,
> 1.143.2.1
> 
> 
> Peter,
> 
> which part of your changes do fix the problem with updatedable shredding of
> large XML documents as reporten in
> [ 1811229 ] [ADT] Adding large document, with update support
> http://sourceforge.net/tracker/index.php?func=detail&aid=1811229&group_id=56
> 967&atid=482468
> ?
> 
> The new has string function in gdk_atoms.mx or the file descriptor fixes in
> gdk_posix.mx?
> 
> The former looks for like a performance fix to me --- too many collisions
> should only slows the system down, but not copromize its
> fucntionallity/correctness, right?
> Also with the new string has functions ("too") many collisions can still
> occur with certain datasets ...
> 
> Stefan
> 
> 
> On Sun, Oct 14, 2007 at 08:31:36PM +0000, Stefan Manegold wrote:
> > Update of /cvsroot/monetdb/MonetDB/src/gdk
> > In directory sc8-pr-cvs16.sourceforge.net:/tmp/cvs-serv15103
> >
> > Modified Files:
> >       Tag: MonetDB_1-20
> > 	gdk_atoms.mx gdk_posix.mx
> > Log Message:
> >
> > [checkin on behalf of Peter]
> >
> > fixing XQuery bug
> > [ 1811229 ] [ADT] Adding large document, with update support
> >
> http://sourceforge.net/tracker/index.php?func=detail&aid=1811229&group_id=56
> 967&atid=482468
> >
> > gdk_atoms.mx:
> > - hash collisions in strings that consists of digits only (a common case!)
> >   we now use a fast derivative of the Bob Jenkins function from now on
> >
> >   Really bad collisions, in case of the 20GB document of the bug report,
> >   shredding took 8 hours before, 1 hour after this change.
> >
> >   NOTE: this change affects the binary format (string heaps) and all
> product
> >         families, as the hash function is a compiled-in macro!
> >         In particular, lookup operations and joins on SQL (Monet4/5)
> columns
> >         consisting of digits only, but stored in a VARCHAR, should be
> faster
> >         after this check-in.
> >
> > gdk_posix.mx
> > - we lost track of the file descriptor for large heaps (the file desc is
> given
> >   to the mmap-monitoring-thread to close later), such that the remap
> function
> >   could fail (when it was given the illegal file descriptor 0)
> >
> >   NOTE: this change only affects xquery it only uses remap()

-- 
| Dr. Stefan Manegold | mailto:Stefan.Manegold at cwi.nl |
| CWI,  P.O.Box 94079 | http://www.cwi.nl/~manegold/  |
| 1090 GB Amsterdam   | Tel.: +31 (20) 592-4212       |
| The Netherlands     | Fax : +31 (20) 592-4312       |




More information about the developers-list mailing list