[Monetdb-developers] hashjoin and strHash

Niels Nes Niels.Nes at cwi.nl
Sat Dec 19 15:46:19 CET 2009


On Sat, Dec 19, 2009 at 03:42:32PM +0100, Sjoerd Mullender wrote:
> On 2009-12-19 15:28, Lefteris wrote:
> > On Sat, Dec 19, 2009 at 2:59 PM, Niels Nes <Niels.Nes at cwi.nl> wrote:
> >> On Sat, Dec 19, 2009 at 12:21:09PM +0100, Wouter Alink wrote:
> >>> Lefteris, you are correct in that i meant 'second time the query was
> >>> run' when I wrote 'hot run'.
> >>>
> >>> I see that at GDK level reuse cannot be estimated. Although with
> >>> current hardware which has an abundance of memory, and the fact that
> >>> strings take up much more storage than a single BUN (so a hash-entry
> >>> is usually relatively small compared to its data) GDK might weigh the
> >>> additional costs. GDK also decides which things to keep in memory or
> >>> throw it out, which in turn is also based on reuse.
> >>> The costs for performing the initial join are dominated by the strHash
> >>> function, and building the hashtable on the big BAT or the smaller BAT
> >>> makes (almost) no difference, except for the additional memory use. If
> >>> on such a big bat again a join is performed, it will be beneficial to
> >>> have the hashtable in place.
> >>>
> >>> What I was hoping for were explanations of situations where it makes
> >>> no sense to build the hashtable on the bigger string BAT, but a good
> >>> counter-example I haven't seen. In general, i can see, it would not be
> >>> beneficial if the big BAT is not joined twice, but if it doesn't hurt
> >>> too much, couldn't it just be the default?
> >>
> >> It cannot be the default as hashtables are simply to big. Assume a
> >> single column which barely fits into your main memory. In that case the
> >> hashtable will not fit leading to sub optimal performance.
> >>
> >> One global rule could be
> >>        build hash on largest if it still fits into L2.
> >>        build hash on smallest once we run out of L2.
> >>
> >> Niels
> >>
> > 
> > Fair enough, but if the big fits in L2, then it is not so big, and
> > then the small is in L2 too, I think you will not see so much
> > difference there, the entire join will be almost in L2, right?
> > 
> > Wouter, is you big BAT string sorted? while the small unsorted? (many
> > times happens that to me because the big is persistent and the small
> > intermediate result). In such cases you can also trigger a sorting on
> > the small and then merge join. there is some code in the gdk for that,
> > but too strict in my opinion.
> 
> We should also consider building the hash table on both sides.  Since
> the hashes have to be calculated, no matter what, it makes sense to
> store those values.  This doesn't necessarily have to be done up front.
>  The hashes for the second table could be stored while doing the scan
> for the join.

All true if you have the space to store them, and although you did
calculate it doesn't make it free to store a hash, it will bring down
the memory->cache bandwidth with about a factor 2.

Niels
> 
> > Lefteris
> > 
> >>>
> >>> Eventually I would like to be using the SQL layer only. Here there
> >>> would be plenty of tables with string-columns, and some will be joined
> >>> against. Should a MAL optimizer detect that I am about to join two
> >>> string-BATs, and that one BAT is bigger than the other and has many
> >>> different values, and therefore should build a hashtable on the bigger
> >>> one? The MAL optimizer can only guess about my next query (although I
> >>> agree that it could do a better job at guessing), and calculating
> >>> heapsize/batsize seems to be an operation that is also difficult to do
> >>> on a MAL layer.
> >>>
> >>> is really nobody in favour of changing the behavior of joining string
> >>> bats for large bats with many different values? well, than I give up.
> >>> Wouter
> >>>
> >>>
> >>> 2009/12/18 Stefan Manegold <Stefan.Manegold at cwi.nl>:
> >>>> Hi Wouter,
> >>>>
> >>>> in the lines of Lefteris' reply:
> >>>> for a single join with no hash table present a priori, the number of hash
> >>>> function calls is euqal to the number of BUNs in both BATs; for each inner
> >>>> BUN the function need to be called to build the hash table, for each outer
> >>>> BUN it needs to be called to probe the hash table. Hence, the pure hashing
> >>>> costs are independent of which BAT is inner and which outer.
> >>>> Given that, the reason the choose the smaller as inner is indeed to increase
> >>>> spacial locallity (and thus performance) of the inherently random access
> >>>> while building and probing the hashtable.
> >>>>
> >>>> As Lefteris pointed out, the "operational optimization" in GDK is a pure
> >>>> peephole optimization dealing only with the very operation at hand. I.e., in
> >>>> general it cannot anticipate the benefits of future re-use of efforts, like
> >>>> investing in the (more expensive) building of a larger hash table to be able
> >>>> to re-use this in several later operations --- which IMHO is independent of
> >>>> the data type. Such descisions need to be made at higher levels, either in
> >>>> MAL optimizers or in the front-end that generates the MAL plan.
> >>>>
> >>>> Stefan
> >>>>
> >>>>
> >>>> On Fri, Dec 18, 2009 at 05:01:07PM +0100, Lefteris wrote:
> >>>>> Hi Wouter,
> >>>>>
> >>>>> funny think, I had the same exact problem and we were thinking about
> >>>>> this issue. The idea here is that this observation for strings might
> >>>>> not be always true, and it is a situation that cannot be always
> >>>>> determined on the kernel level. Correct me if I am wrong, but your
> >>>>> benefit on query comes because the hash in the large BAT is already
> >>>>> there, that's why the second time you get 0.01? You mention hot run so
> >>>>> I assume the BAT is already there with a hash index. While in the
> >>>>> original situation the hash is on the small BAT thus you don't benefit
> >>>>> from the hot run. But if a big BAT of strings is to be used again it
> >>>>> is unknown in the gdk level. So, I solved the problem by forcing the
> >>>>> hash index on the big BAT in a higher level (in Monet5) where it knows
> >>>>> something more about the application (in my case RDF store). Can you
> >>>>> do instead that? force the hash index in a higher level for you
> >>>>> application? If gdk see a hash index already there, then it will
> >>>>> choose that independent of the size.
> >>>>>
> >>>>> lefteris
> >>>>>
> >>>>> On Fri, Dec 18, 2009 at 4:22 PM, Wouter Alink <wouter.alink at gmail.com> wrote:
> >>>>>> Dear developers,
> >>>>>>
> >>>>>> I would like to propose a change in GDK and hear opinions. It is about
> >>>>>> the following issue:
> >>>>>>
> >>>>>> in the BATjoin code, if there is no possibility to do a fetch or merge
> >>>>>> join, a hashjoin is performed. A hashtable is created for the smallest
> >>>>>> BAT. The reasons (i could think of) for choosing the smallest BAT for
> >>>>>> the hashtable are that less space is required for the hashtable (which
> >>>>>> in turn causes less cache misses when doing a lookup) and also because
> >>>>>> the hashfunction used is assumed to be very inexpensive (it needs to
> >>>>>> be calculated for each item in the large bat each time a join is
> >>>>>> performed).
> >>>>>> I can see that the hashfunction can be very efficient for data types
> >>>>>> without indirection, but I feel that for data types like strings in
> >>>>>> some cases this is a little different. If a string BAT for example
> >>>>>> contains many different values (i.e. is not a bat which contains
> >>>>>> enumeration values) the hashfunction will not be inexpensive anymore
> >>>>>> (many cache misses), as each hashfunction call needs to hash a whole
> >>>>>> (arbitrary long) string at an arbitrary place in the heap.
> >>>>>>
> >>>>>> Is it perhaps possible to specify that, when a BAT of type 'str' has
> >>>>>> many different values a hashtable may be build on the large BAT
> >>>>>> instead of on the small BAT?
> >>>>>>
> >>>>>> Reason that I ask this: I was analysing costs of a query in which I
> >>>>>> had a few short strings (26 tuples, 1-column table: varchar) which I
> >>>>>> wanted to look up in a dictionary (9M tuples, 2-column table:
> >>>>>> int,varchar).  "SELECT a.id FROM longlist AS a JOIN smalllist as b ON
> >>>>>> a.strvalue=b.strvalue;"
> >>>>>> The result is a small list of integers (26 or less tuples). This
> >>>>>> operation currently takes roughly 1.5 seconds for a hot run, mostly
> >>>>>> due to 9M strHash operations. By applying the patch below the
> >>>>>> execution time for a hot run dropped down to .01 seconds. The
> >>>>>> performance gain is caused by only having to perform strHash on the
> >>>>>> items in the small bat once the hashtable for the large bat has been
> >>>>>> created.
> >>>>>>
> >>>>>> Any suggestions whether such a change is useful? Which benchmarks will
> >>>>>> be influenced?
> >>>>>>
> >>>>>> I guess this code change is probably not useful for large string BATs
> >>>>>> with only few different values, but perhaps a guess could be made how
> >>>>>> diverse the strings in a bat are (by taking a sample or perhaps simply
> >>>>>> by looking at the ratio batsize/heapsize), and based on that determine
> >>>>>> whether to build it on the large or small BAT?
> >>>>>>
> >>>>>> Greetings,
> >>>>>> Wouter
> >>>>>>
> >>>>>>
> >>>>>> Index: src/gdk/gdk_relop.mx
> >>>>>> ===================================================================
> >>>>>> RCS file: /cvsroot/monetdb/MonetDB/src/gdk/gdk_relop.mx,v
> >>>>>> retrieving revision 1.167.2.4
> >>>>>> diff -u -r1.167.2.4 gdk_relop.mx
> >>>>>> --- src/gdk/gdk_relop.mx        20 Nov 2009 13:04:06 -0000      1.167.2.4
> >>>>>> +++ src/gdk/gdk_relop.mx        18 Dec 2009 14:59:13 -0000
> >>>>>> @@ -1232,7 +1232,12 @@
> >>>>>>  @-
> >>>>>>  hash join: the bread&butter join of monet
> >>>>>>  @c
> >>>>>> -       /* Simple rule, always build hash on the smallest */
> >>>>>> +       /* Simple rule, always build hash on the smallest,
> >>>>>> +                except when it is a string-join, then we do the opposite */
> >>>>>> +       if (swap && rcount < lcount && l->ttype == TYPE_str) {
> >>>>>> +               ALGODEBUG THRprintf(GDKout, "#BATjoin:
> >>>>>> BATmirror(BAThashjoin(BATmirror(r), BATmirror(l)," BUNFMT "));\n",
> >>>>>> estimate);
> >>>>>> +               return BATmirror(BAThashjoin(BATmirror(r),
> >>>>>> BATmirror(l), estimate));
> >>>>>> +       }
> >>>>>>        if (swap && rcount > lcount) {
> >>>>>>                ALGODEBUG THRprintf(GDKout, "#BATjoin:
> >>>>>> BATmirror(BAThashjoin(BATmirror(r), BATmirror(l)," BUNFMT "));\n",
> >>>>>> estimate);
> >>>>>>
> >>>>>> ------------------------------------------------------------------------------
> >>>>>> This SF.Net email is sponsored by the Verizon Developer Community
> >>>>>> Take advantage of Verizon's best-in-class app development support
> >>>>>> A streamlined, 14 day to market process makes app distribution fast and easy
> >>>>>> Join now and get one step closer to millions of Verizon customers
> >>>>>> http://p.sf.net/sfu/verizon-dev2dev
> >>>>>> _______________________________________________
> >>>>>> Monetdb-developers mailing list
> >>>>>> Monetdb-developers at lists.sourceforge.net
> >>>>>> https://lists.sourceforge.net/lists/listinfo/monetdb-developers
> >>>>>>
> >>>>>
> >>>>> ------------------------------------------------------------------------------
> >>>>> This SF.Net email is sponsored by the Verizon Developer Community
> >>>>> Take advantage of Verizon's best-in-class app development support
> >>>>> A streamlined, 14 day to market process makes app distribution fast and easy
> >>>>> Join now and get one step closer to millions of Verizon customers
> >>>>> http://p.sf.net/sfu/verizon-dev2dev
> >>>>> _______________________________________________
> >>>>> Monetdb-developers mailing list
> >>>>> Monetdb-developers at lists.sourceforge.net
> >>>>> https://lists.sourceforge.net/lists/listinfo/monetdb-developers
> >>>>>
> >>>>
> >>>> --
> >>>> | Dr. Stefan Manegold | mailto:Stefan.Manegold at cwi.nl |
> >>>> | CWI,  P.O.Box 94079 | http://www.cwi.nl/~manegold/  |
> >>>> | 1090 GB Amsterdam   | Tel.: +31 (20) 592-4212       |
> >>>> | The Netherlands     | Fax : +31 (20) 592-4199       |
> >>>>
> >>>
> >>> ------------------------------------------------------------------------------
> >>> This SF.Net email is sponsored by the Verizon Developer Community
> >>> Take advantage of Verizon's best-in-class app development support
> >>> A streamlined, 14 day to market process makes app distribution fast and easy
> >>> Join now and get one step closer to millions of Verizon customers
> >>> http://p.sf.net/sfu/verizon-dev2dev
> >>> _______________________________________________
> >>> Monetdb-developers mailing list
> >>> Monetdb-developers at lists.sourceforge.net
> >>> https://lists.sourceforge.net/lists/listinfo/monetdb-developers
> >>
> >> --
> >> Niels Nes, Centrum Wiskunde & Informatica (CWI)
> >> Science Park 123, 1098 XG Amsterdam, The Netherlands
> >> room L3.14,  phone ++31 20 592-4098
> >> url: http://www.cwi.nl/~niels   e-mail: Niels.Nes at cwi.nl
> >>
> >> ------------------------------------------------------------------------------
> >> This SF.Net email is sponsored by the Verizon Developer Community
> >> Take advantage of Verizon's best-in-class app development support
> >> A streamlined, 14 day to market process makes app distribution fast and easy
> >> Join now and get one step closer to millions of Verizon customers
> >> http://p.sf.net/sfu/verizon-dev2dev
> >> _______________________________________________
> >> Monetdb-developers mailing list
> >> Monetdb-developers at lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/monetdb-developers
> >>
> > 
> > ------------------------------------------------------------------------------
> > This SF.Net email is sponsored by the Verizon Developer Community
> > Take advantage of Verizon's best-in-class app development support
> > A streamlined, 14 day to market process makes app distribution fast and easy
> > Join now and get one step closer to millions of Verizon customers
> > http://p.sf.net/sfu/verizon-dev2dev 
> > _______________________________________________
> > Monetdb-developers mailing list
> > Monetdb-developers at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/monetdb-developers
> 
> 
> -- 
> Sjoerd Mullender
> 
> ------------------------------------------------------------------------------
> This SF.Net email is sponsored by the Verizon Developer Community
> Take advantage of Verizon's best-in-class app development support
> A streamlined, 14 day to market process makes app distribution fast and easy
> Join now and get one step closer to millions of Verizon customers
> http://p.sf.net/sfu/verizon-dev2dev 
> _______________________________________________
> Monetdb-developers mailing list
> Monetdb-developers at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/monetdb-developers

-- 
Niels Nes, Centrum Wiskunde & Informatica (CWI)
Science Park 123, 1098 XG Amsterdam, The Netherlands
room L3.14,  phone ++31 20 592-4098
url: http://www.cwi.nl/~niels   e-mail: Niels.Nes at cwi.nl




More information about the developers-list mailing list