Hashjoin performance with large vs small tables

Roberto Cornacchia roberto.cornacchia at gmail.com
Tue May 12 12:45:48 CEST 2015


Yes, debug build :/

However, callgrind shows that on the "bad" run (1.6M probes against 1
tuple), the majority of the time is spent on strHash, which contains no
assertion.

For now, in attachment the timings I got with the debug build.

Roberto

On 12 May 2015 at 12:30, Stefan Manegold <Stefan.Manegold at cwi.nl> wrote:

> Roberto,
>
> do you run a debug or an optimized build?
>
> if the former, try the latter or at least disable assertions.
>
> With a debug build (i.e., with assertions enabled)
> you might meassure the costs for (potentially expensive) assertions,
> rather than for pure execution ...
>
> Stefan
>
> ----- On May 12, 2015, at 11:00 AM, Roberto Cornacchia
> roberto.cornacchia at gmail.com wrote:
>
> > One more detail: I ran all tests with sequential_pipe
> >
> > On 12 May 2015 at 10:58, Roberto Cornacchia <
> roberto.cornacchia at gmail.com >
> > wrote:
> >
> >
> >
> >
> >
> > On 11 May 2015 at 22:58, Stefan Manegold < Stefan.Manegold at cwi.nl >
> wrote:
> >
> >
> > Roberto,
> >
> > just to recap all facts:
> >
> > - MonetDB Oct2014-SP3
> >
> > - equality join between 2 string-BATs
> >
> > - both BATs are persistent "base" BATs
> >
> > Correct
> >
> >
> >
> > - larger BAT has 16 M BUNs
> >
> >
> > Apologies, I had misread the count here. It is 1.6M
> > But this doesn't change things.
> >
> >
> >
> > - smaller BAT has 1 BUN
> >
> > - when (forcefully) building the hash on the larger one,
> > and then performing a single probe from the smaller one,
> > the first ("cold") join takes 30 ms (building the hash table),
> > while any next ("hot") one takes 0.8 ms (re-using the pre-built hash
> table).
> > This suggests ~29.2 ms for building the 16 M hash table
> > and 0.8 ms for a single probe into that hash table.
> >
> >
> > Correct (except 16M -> 1.6M)
> >
> >
> > - when building the 1 BUN hash table on the smaller one,
> > and then performing 16 M probes from the larger one,
> > the (first?) ("cold"?) join takes 430 ms?
> >
> >
> > Correct (except 16M -> 1.6M)
> >
> >
> > + How long does a subsequent ("hot") join take (re-using the pre-built
> hash
> > table)?
> >
> >
> > Exactly the same, as expected. The pre-built hash table on the 1-tuple
> bat can
> > hardly be useful
> >
> >
> >
> > Could you run detailed profiling (e.g., using valgrind/callgrind) to
> analyze
> > where
> > the time goes in all 4 cases (hash on larger vs. hash on smaller &
> "cold" vs.
> > "hot")?
> >
> >
> > Could you share your data to reproduce and analyze the problem?
> >
> >
> > I'm sending data and profiling by email.
> > Thank you.
> >
> >
> >
> >
> > Thanks!
> >
> > Stefan
> >
> >
> > ----- On May 11, 2015, at 6:49 PM, Roberto Cornacchia
> > roberto.cornacchia at gmail.com wrote:
> >
> >>> Also, those 430ms are not invested. The second time will still take
> 430ms. So
> >>> hashing on a very small bat is never a good investment. On the
> contrary,
> >>> hashing on a larger (but not too much) table is a good investment. The
> next
> >>> time a similar query comes in, it will be sub-millisecond.
> >>
> >> Well, this is a trade-off that in in general hard to judge.
> >> If the bigger table / BAT is a base table/BAT, the hash table will
> (nowadays)
> >> be made persistent and *could* be reused --- whether it indeed will be
> reused,
> >> we cannot predict. If the bigger table is a transient intermediate
> result,
> >> re-use is unlikely ...
> >>
> >>
> >> That's fair.
> >>
> >>
> >> Having said that, is your smaller table a base table or an intermediate
> result
> >> that is (might be) a tiny slice of a large (huge) base table?
> >> Then current code might build the hash on the entire parent BAT rather
> than on
> >> the tiny slice ...
> >>
> >>
> >> They both are base tables. The tiny table is created and a single
> insert is
> >> done. The large one is also a regular table, with NOT NULL constraint
> on the
> >> join column and the entire table is marked read-only.
> >>
> >>
> >>
> >> Also: Which version of MonetDB are we talking about?
> >>
> >>
> >> Oct2014 SP3
> >>
> >>
> >> Stefan
> >>
> >> --
> >>| Stefan.Manegold at CWI.nl | DB Architectures (DA) |
> >>| www.CWI.nl/~manegold/ | Science Park 123 (L321) |
> >>| +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
> >> _______________________________________________
> >> developers-list mailing list
> >> developers-list at monetdb.org
> >> https://www.monetdb.org/mailman/listinfo/developers-list
> >>
> >>
> >> _______________________________________________
> >> developers-list mailing list
> >> developers-list at monetdb.org
> >> https://www.monetdb.org/mailman/listinfo/developers-list
> >
> > --
> >| Stefan.Manegold at CWI.nl | DB Architectures (DA) |
> >| www.CWI.nl/~manegold/ | Science Park 123 (L321) |
> >| +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
> > _______________________________________________
> > developers-list mailing list
> > developers-list at monetdb.org
> > https://www.monetdb.org/mailman/listinfo/developers-list
> >
> >
> >
> > _______________________________________________
> > developers-list mailing list
> > developers-list at monetdb.org
> > https://www.monetdb.org/mailman/listinfo/developers-list
>
> --
> | Stefan.Manegold at CWI.nl | DB Architectures   (DA) |
> | www.CWI.nl/~manegold/  | Science Park 123 (L321) |
> | +31 (0)20 592-4212     | 1098 XG Amsterdam  (NL) |
> _______________________________________________
> developers-list mailing list
> developers-list at monetdb.org
> https://www.monetdb.org/mailman/listinfo/developers-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.monetdb.org/pipermail/developers-list/attachments/20150512/3ec7b2b2/attachment.html>
-------------- next part --------------
MonetDB 5 server v11.19.12 (64-bit, 64-bit oids)
This is an unreleased version
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2015 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 15.6GiB available memory, 8 available cpu cores
Libraries:
  libpcre: 8.35 2014-04-04 (compiled with 8.35)
  openssl: OpenSSL 1.0.1k 8 Jan 2015 (compiled with OpenSSL 1.0.1k-fips 8 Jan 2015)
  libxml2: 2.9.1 (compiled with 2.9.1)
Compiled by: roberto at photon.spinque.com (x86_64-unknown-linux-gnu)
Compilation: gcc  -g  -Werror -Wall -Wextra -W -Werror-implicit-function-declaration -Wpointer-arith -Wdeclaration-after-statement -Wundef -Wformat=2 -Wno-format-nonliteral -Winit-self -Winvalid-pch -Wmissing-declarations -Wmissing-format-attribute -Wmissing-prototypes -Wold-style-definition -Wpacked -Wunknown-pragmas -Wvariadic-macros -fstack-protector-all -Wstack-protector -Wpacked-bitfield-compat -Wsync-nand -Wjump-misses-init -Wmissing-include-dirs -Wlogical-op -Wunreachable-code
Linking    : /usr/bin/ld -m elf_x86_64 


=========== Q1 ===========
==========================
Equality join on string column

== Left ==

- 1.6M strings (not null)
- 16 unique values
- not sorted

== Right ==

- 1 string

== Results ==

- Hash on right (small), cold: 430ms
- Hash on right (small), hot: 430ms

- Hash on left (large), cold: 35ms
- Hash on left (large), hot: 1ms

== Observations == 

The fact that the large table has only 16 distinct values makes the hash table very small. 
Still, 1.6M hash function calls are needed. Why is that much faster than 1.6M probes against 1 tuple?



=========== Q2 ===========
==========================
Equality join on string column

== Left ==

- 1.2M strings (not null, unique)
- 1.2M unique values
- not sorted

== Right ==

- 1 string

== Results ==

- Hash on right (small), cold: 240ms
- Hash on right (small), hot: 240ms

- Hash on left (large), cold: 57ms
- Hash on left (large), hot: 1ms

== Observations == 

The sting colunt on left is marked not null and unique. Shouldn't it have a hash table already?



More information about the developers-list mailing list