Hashjoin performance with large vs small tables

Roberto Cornacchia roberto.cornacchia at gmail.com
Mon May 11 18:46:27 CEST 2015


Hi Stefan,

Actually, according to my second email, 430 should be compared to 30, not
to 0.8. So the difference in speed is "only" 15x.

Still, your question remains. However, even if the two performed equally,
the current case would not invest anything..

On 11 May 2015 at 18:40, Stefan Manegold <Stefan.Manegold at cwi.nl> wrote:

>
> Hi Roberto,
>
> thanks for reporting!
>
> This need detailed profiling to understand what's happening.
> Why are 16M look-ups to a 1-tuple hash table 500x more expensive
> than building a 16M hash table (assuming that is it indeed built on the
> fly,
> i.e., does not already exist), and then performing a single look-up ... ?
>
> Stefan
>
>
> ----- On May 11, 2015, at 6:30 PM, Roberto Cornacchia
> roberto.cornacchia at gmail.com wrote:
>
> > Hi there,
> >
> > Triggered by a suspiciously slow join, I had a look at BATsubjoin, in
> particular
> > at how it is decided which algorithm to use.
> >
> > I looked at a couple of cases with a join on strings.
> > This is one (pseudo syntax):
> >
> > l.count ~16M
> > l.T.sorted = 0
> > l.T.revsorted = 0
> >
> > r.count = 1
> > r.T.sorted = 1
> > r.T.revsorted = 1
> >
> > In this case, a hash table on r is created (because it's smaller).
> > This join takes 430ms .
> > I forced swapping l and r, thus built the hash table on the larger bat,
> and then
> > it takes 0.8ms .
> >
> > Notice that though r.count = 1 is a bit of an extreme case, it does
> happen often
> > in practice. And the case with a few tuples would not be very different.
> >
> > The question I wanted to ask is:
> >
> > is it always right to build the hash table on the small table?
> > Perhaps some more heuristics can help?
> > Like: if the smaller is very small, and the larger is not extremely
> large, then
> > hash on the larger one?
> >
> > Roberto
> >
> >
> > _______________________________________________
> > developers-list mailing list
> > developers-list at monetdb.org
> > https://www.monetdb.org/mailman/listinfo/developers-list
>
> --
> | Stefan.Manegold at CWI.nl | DB Architectures   (DA) |
> | www.CWI.nl/~manegold/  | Science Park 123 (L321) |
> | +31 (0)20 592-4212     | 1098 XG Amsterdam  (NL) |
> _______________________________________________
> developers-list mailing list
> developers-list at monetdb.org
> https://www.monetdb.org/mailman/listinfo/developers-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.monetdb.org/pipermail/developers-list/attachments/20150511/44103d80/attachment.html>


More information about the developers-list mailing list