Hashjoin performance with large vs small tables

Roberto Cornacchia roberto.cornacchia at gmail.com
Tue May 12 13:02:50 CEST 2015


Yes, that makes sense.

I also got suspicious about the effect of very few distinct values.
That's why I included a second case, with 1.2M tuples on the large table,
which are *all distinct*.
This case, though to a lesser extent than the first case (12x speedup),
benefits as well from hashing on the large table (4x speedup).


On 12 May 2015 at 12:55, Stefan Manegold <Stefan.Manegold at cwi.nl> wrote:

>
> Thanks!
>
> A first *guess*:
>
> With the hash table on the smaller and the 1.6 M probes from the larger,
> the hash is calculated for each of the 1.6 M tuples, "although" there are
> only 16 distinct values;
> apparently, the hash values are not stored ("cached") in the string heap,
> or at least not re-used.
>
> With the hash table on the larger and a single probe from the smaller,
> the string hash appears to be calculated only once per each of the 16
> distinct values,
> (or is already pre-calculated from the string inserts?)
> and then is re-used during hash table build.
>
> Stefan
>
> as a side note, the "trickling in" information about detailed data and
> workload characteristics
> slowly seem to clearify / sharpen the picture ...
>
> ----- On May 12, 2015, at 12:45 PM, Roberto Cornacchia
> roberto.cornacchia at gmail.com wrote:
>
> > Yes, debug build :/
> >
> > However, callgrind shows that on the "bad" run (1.6M probes against 1
> tuple),
> > the majority of the time is spent on strHash, which contains no
> assertion.
> >
> > For now, in attachment the timings I got with the debug build.
> >
> > Roberto
> >
> > On 12 May 2015 at 12:30, Stefan Manegold < Stefan.Manegold at cwi.nl >
> wrote:
> >
> >
> > Roberto,
> >
> > do you run a debug or an optimized build?
> >
> > if the former, try the latter or at least disable assertions.
> >
> > With a debug build (i.e., with assertions enabled)
> > you might meassure the costs for (potentially expensive) assertions,
> > rather than for pure execution ...
> >
> > Stefan
> >
> > ----- On May 12, 2015, at 11:00 AM, Roberto Cornacchia
> > roberto.cornacchia at gmail.com wrote:
> >
> >> One more detail: I ran all tests with sequential_pipe
> >>
> >> On 12 May 2015 at 10:58, Roberto Cornacchia <
> roberto.cornacchia at gmail.com >
> >> wrote:
> >>
> >>
> >>
> >>
> >>
> >> On 11 May 2015 at 22:58, Stefan Manegold < Stefan.Manegold at cwi.nl >
> wrote:
> >>
> >>
> >> Roberto,
> >>
> >> just to recap all facts:
> >>
> >> - MonetDB Oct2014-SP3
> >>
> >> - equality join between 2 string-BATs
> >>
> >> - both BATs are persistent "base" BATs
> >>
> >> Correct
> >>
> >>
> >>
> >> - larger BAT has 16 M BUNs
> >>
> >>
> >> Apologies, I had misread the count here. It is 1.6M
> >> But this doesn't change things.
> >>
> >>
> >>
> >> - smaller BAT has 1 BUN
> >>
> >> - when (forcefully) building the hash on the larger one,
> >> and then performing a single probe from the smaller one,
> >> the first ("cold") join takes 30 ms (building the hash table),
> >> while any next ("hot") one takes 0.8 ms (re-using the pre-built hash
> table).
> >> This suggests ~29.2 ms for building the 16 M hash table
> >> and 0.8 ms for a single probe into that hash table.
> >>
> >>
> >> Correct (except 16M -> 1.6M)
> >>
> >>
> >> - when building the 1 BUN hash table on the smaller one,
> >> and then performing 16 M probes from the larger one,
> >> the (first?) ("cold"?) join takes 430 ms?
> >>
> >>
> >> Correct (except 16M -> 1.6M)
> >>
> >>
> >> + How long does a subsequent ("hot") join take (re-using the pre-built
> hash
> >> table)?
> >>
> >>
> >> Exactly the same, as expected. The pre-built hash table on the 1-tuple
> bat can
> >> hardly be useful
> >>
> >>
> >>
> >> Could you run detailed profiling (e.g., using valgrind/callgrind) to
> analyze
> >> where
> >> the time goes in all 4 cases (hash on larger vs. hash on smaller &
> "cold" vs.
> >> "hot")?
> >>
> >>
> >> Could you share your data to reproduce and analyze the problem?
> >>
> >>
> >> I'm sending data and profiling by email.
> >> Thank you.
> >>
> >>
> >>
> >>
> >> Thanks!
> >>
> >> Stefan
> >>
> >>
> >> ----- On May 11, 2015, at 6:49 PM, Roberto Cornacchia
> >> roberto.cornacchia at gmail.com wrote:
> >>
> >>>> Also, those 430ms are not invested. The second time will still take
> 430ms. So
> >>>> hashing on a very small bat is never a good investment. On the
> contrary,
> >>>> hashing on a larger (but not too much) table is a good investment.
> The next
> >>>> time a similar query comes in, it will be sub-millisecond.
> >>>
> >>> Well, this is a trade-off that in in general hard to judge.
> >>> If the bigger table / BAT is a base table/BAT, the hash table will
> (nowadays)
> >>> be made persistent and *could* be reused --- whether it indeed will be
> reused,
> >>> we cannot predict. If the bigger table is a transient intermediate
> result,
> >>> re-use is unlikely ...
> >>>
> >>>
> >>> That's fair.
> >>>
> >>>
> >>> Having said that, is your smaller table a base table or an
> intermediate result
> >>> that is (might be) a tiny slice of a large (huge) base table?
> >>> Then current code might build the hash on the entire parent BAT rather
> than on
> >>> the tiny slice ...
> >>>
> >>>
> >>> They both are base tables. The tiny table is created and a single
> insert is
> >>> done. The large one is also a regular table, with NOT NULL constraint
> on the
> >>> join column and the entire table is marked read-only.
> >>>
> >>>
> >>>
> >>> Also: Which version of MonetDB are we talking about?
> >>>
> >>>
> >>> Oct2014 SP3
> >>>
> >>>
> >>> Stefan
> >>>
> >>> --
> >>>| Stefan.Manegold at CWI.nl | DB Architectures (DA) |
> >>>| www.CWI.nl/~manegold/ | Science Park 123 (L321) |
> >>>| +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
> >>> _______________________________________________
> >>> developers-list mailing list
> >>> developers-list at monetdb.org
> >>> https://www.monetdb.org/mailman/listinfo/developers-list
> >>>
> >>>
> >>> _______________________________________________
> >>> developers-list mailing list
> >>> developers-list at monetdb.org
> >>> https://www.monetdb.org/mailman/listinfo/developers-list
> >>
> >> --
> >>| Stefan.Manegold at CWI.nl | DB Architectures (DA) |
> >>| www.CWI.nl/~manegold/ | Science Park 123 (L321) |
> >>| +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
> >> _______________________________________________
> >> developers-list mailing list
> >> developers-list at monetdb.org
> >> https://www.monetdb.org/mailman/listinfo/developers-list
> >>
> >>
> >>
> >> _______________________________________________
> >> developers-list mailing list
> >> developers-list at monetdb.org
> >> https://www.monetdb.org/mailman/listinfo/developers-list
> >
> > --
> >| Stefan.Manegold at CWI.nl | DB Architectures (DA) |
> >| www.CWI.nl/~manegold/ | Science Park 123 (L321) |
> >| +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
> > _______________________________________________
> > developers-list mailing list
> > developers-list at monetdb.org
> > https://www.monetdb.org/mailman/listinfo/developers-list
> >
> >
> > _______________________________________________
> > developers-list mailing list
> > developers-list at monetdb.org
> > https://www.monetdb.org/mailman/listinfo/developers-list
>
> --
> | Stefan.Manegold at CWI.nl | DB Architectures   (DA) |
> | www.CWI.nl/~manegold/  | Science Park 123 (L321) |
> | +31 (0)20 592-4212     | 1098 XG Amsterdam  (NL) |
> _______________________________________________
> developers-list mailing list
> developers-list at monetdb.org
> https://www.monetdb.org/mailman/listinfo/developers-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.monetdb.org/pipermail/developers-list/attachments/20150512/e003f3a1/attachment.html>


More information about the developers-list mailing list