Hashjoin performance with large vs small tables

Stefan Manegold Stefan.Manegold at cwi.nl
Tue May 12 12:55:52 CEST 2015


Thanks!

A first *guess*:

With the hash table on the smaller and the 1.6 M probes from the larger,
the hash is calculated for each of the 1.6 M tuples, "although" there are only 16 distinct values;
apparently, the hash values are not stored ("cached") in the string heap, or at least not re-used.

With the hash table on the larger and a single probe from the smaller,
the string hash appears to be calculated only once per each of the 16 distinct values,
(or is already pre-calculated from the string inserts?)
and then is re-used during hash table build.

Stefan

as a side note, the "trickling in" information about detailed data and workload characteristics
slowly seem to clearify / sharpen the picture ...

----- On May 12, 2015, at 12:45 PM, Roberto Cornacchia roberto.cornacchia at gmail.com wrote:

> Yes, debug build :/
> 
> However, callgrind shows that on the "bad" run (1.6M probes against 1 tuple),
> the majority of the time is spent on strHash, which contains no assertion.
> 
> For now, in attachment the timings I got with the debug build.
> 
> Roberto
> 
> On 12 May 2015 at 12:30, Stefan Manegold < Stefan.Manegold at cwi.nl > wrote:
> 
> 
> Roberto,
> 
> do you run a debug or an optimized build?
> 
> if the former, try the latter or at least disable assertions.
> 
> With a debug build (i.e., with assertions enabled)
> you might meassure the costs for (potentially expensive) assertions,
> rather than for pure execution ...
> 
> Stefan
> 
> ----- On May 12, 2015, at 11:00 AM, Roberto Cornacchia
> roberto.cornacchia at gmail.com wrote:
> 
>> One more detail: I ran all tests with sequential_pipe
>> 
>> On 12 May 2015 at 10:58, Roberto Cornacchia < roberto.cornacchia at gmail.com >
>> wrote:
>> 
>> 
>> 
>> 
>> 
>> On 11 May 2015 at 22:58, Stefan Manegold < Stefan.Manegold at cwi.nl > wrote:
>> 
>> 
>> Roberto,
>> 
>> just to recap all facts:
>> 
>> - MonetDB Oct2014-SP3
>> 
>> - equality join between 2 string-BATs
>> 
>> - both BATs are persistent "base" BATs
>> 
>> Correct
>> 
>> 
>> 
>> - larger BAT has 16 M BUNs
>> 
>> 
>> Apologies, I had misread the count here. It is 1.6M
>> But this doesn't change things.
>> 
>> 
>> 
>> - smaller BAT has 1 BUN
>> 
>> - when (forcefully) building the hash on the larger one,
>> and then performing a single probe from the smaller one,
>> the first ("cold") join takes 30 ms (building the hash table),
>> while any next ("hot") one takes 0.8 ms (re-using the pre-built hash table).
>> This suggests ~29.2 ms for building the 16 M hash table
>> and 0.8 ms for a single probe into that hash table.
>> 
>> 
>> Correct (except 16M -> 1.6M)
>> 
>> 
>> - when building the 1 BUN hash table on the smaller one,
>> and then performing 16 M probes from the larger one,
>> the (first?) ("cold"?) join takes 430 ms?
>> 
>> 
>> Correct (except 16M -> 1.6M)
>> 
>> 
>> + How long does a subsequent ("hot") join take (re-using the pre-built hash
>> table)?
>> 
>> 
>> Exactly the same, as expected. The pre-built hash table on the 1-tuple bat can
>> hardly be useful
>> 
>> 
>> 
>> Could you run detailed profiling (e.g., using valgrind/callgrind) to analyze
>> where
>> the time goes in all 4 cases (hash on larger vs. hash on smaller & "cold" vs.
>> "hot")?
>> 
>> 
>> Could you share your data to reproduce and analyze the problem?
>> 
>> 
>> I'm sending data and profiling by email.
>> Thank you.
>> 
>> 
>> 
>> 
>> Thanks!
>> 
>> Stefan
>> 
>> 
>> ----- On May 11, 2015, at 6:49 PM, Roberto Cornacchia
>> roberto.cornacchia at gmail.com wrote:
>> 
>>>> Also, those 430ms are not invested. The second time will still take 430ms. So
>>>> hashing on a very small bat is never a good investment. On the contrary,
>>>> hashing on a larger (but not too much) table is a good investment. The next
>>>> time a similar query comes in, it will be sub-millisecond.
>>> 
>>> Well, this is a trade-off that in in general hard to judge.
>>> If the bigger table / BAT is a base table/BAT, the hash table will (nowadays)
>>> be made persistent and *could* be reused --- whether it indeed will be reused,
>>> we cannot predict. If the bigger table is a transient intermediate result,
>>> re-use is unlikely ...
>>> 
>>> 
>>> That's fair.
>>> 
>>> 
>>> Having said that, is your smaller table a base table or an intermediate result
>>> that is (might be) a tiny slice of a large (huge) base table?
>>> Then current code might build the hash on the entire parent BAT rather than on
>>> the tiny slice ...
>>> 
>>> 
>>> They both are base tables. The tiny table is created and a single insert is
>>> done. The large one is also a regular table, with NOT NULL constraint on the
>>> join column and the entire table is marked read-only.
>>> 
>>> 
>>> 
>>> Also: Which version of MonetDB are we talking about?
>>> 
>>> 
>>> Oct2014 SP3
>>> 
>>> 
>>> Stefan
>>> 
>>> --
>>>| Stefan.Manegold at CWI.nl | DB Architectures (DA) |
>>>| www.CWI.nl/~manegold/ | Science Park 123 (L321) |
>>>| +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
>>> _______________________________________________
>>> developers-list mailing list
>>> developers-list at monetdb.org
>>> https://www.monetdb.org/mailman/listinfo/developers-list
>>> 
>>> 
>>> _______________________________________________
>>> developers-list mailing list
>>> developers-list at monetdb.org
>>> https://www.monetdb.org/mailman/listinfo/developers-list
>> 
>> --
>>| Stefan.Manegold at CWI.nl | DB Architectures (DA) |
>>| www.CWI.nl/~manegold/ | Science Park 123 (L321) |
>>| +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
>> _______________________________________________
>> developers-list mailing list
>> developers-list at monetdb.org
>> https://www.monetdb.org/mailman/listinfo/developers-list
>> 
>> 
>> 
>> _______________________________________________
>> developers-list mailing list
>> developers-list at monetdb.org
>> https://www.monetdb.org/mailman/listinfo/developers-list
> 
> --
>| Stefan.Manegold at CWI.nl | DB Architectures (DA) |
>| www.CWI.nl/~manegold/ | Science Park 123 (L321) |
>| +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
> _______________________________________________
> developers-list mailing list
> developers-list at monetdb.org
> https://www.monetdb.org/mailman/listinfo/developers-list
> 
> 
> _______________________________________________
> developers-list mailing list
> developers-list at monetdb.org
> https://www.monetdb.org/mailman/listinfo/developers-list

-- 
| Stefan.Manegold at CWI.nl | DB Architectures   (DA) |
| www.CWI.nl/~manegold/  | Science Park 123 (L321) |
| +31 (0)20 592-4212     | 1098 XG Amsterdam  (NL) |


More information about the developers-list mailing list