Hashjoin performance with large vs small tables

Roberto Cornacchia roberto.cornacchia at gmail.com
Tue May 12 11:00:36 CEST 2015


One more detail: I ran all tests with sequential_pipe

On 12 May 2015 at 10:58, Roberto Cornacchia <roberto.cornacchia at gmail.com>
wrote:

>
>
> On 11 May 2015 at 22:58, Stefan Manegold <Stefan.Manegold at cwi.nl> wrote:
>
>> Roberto,
>>
>> just to recap all facts:
>>
>> - MonetDB Oct2014-SP3
>>
>> - equality join between 2 string-BATs
>>
>> - both BATs are persistent "base" BATs
>>
>
> Correct
>
>
>>
>> - larger BAT has 16 M BUNs
>>
>>
> Apologies, I had misread the count here. It is 1.6M
> But this doesn't change things.
>
> - smaller BAT has 1 BUN
>>
>> - when (forcefully) building the hash on the larger one,
>>   and then performing a single probe from the smaller one,
>>   the first ("cold") join takes 30 ms (building the hash table),
>>   while any next ("hot") one takes 0.8 ms (re-using the pre-built hash
>> table).
>>   This suggests ~29.2 ms for building the 16 M hash table
>>   and 0.8 ms for a single probe into that hash table.
>>
>>
> Correct (except 16M -> 1.6M)
>
>
>> - when building the 1 BUN hash table on the smaller one,
>>   and then performing 16 M probes from the larger one,
>>   the (first?) ("cold"?) join takes 430 ms?
>>
>>
> Correct (except 16M -> 1.6M)
>
>
>> + How long does a subsequent ("hot") join take (re-using the pre-built
>> hash table)?
>>
>>
> Exactly the same, as expected. The pre-built hash table on the 1-tuple bat
> can hardly be useful
>
>
>>
>> Could you run detailed profiling (e.g., using valgrind/callgrind) to
>> analyze where
>> the time goes in all 4 cases (hash on larger vs. hash on smaller & "cold"
>> vs. "hot")?
>>
> Could you share your data to reproduce and analyze the problem?
>>
>>
> I'm sending data and profiling by email.
> Thank you.
>
>
> Thanks!
>>
>> Stefan
>>
>>
>> ----- On May 11, 2015, at 6:49 PM, Roberto Cornacchia
>> roberto.cornacchia at gmail.com wrote:
>>
>> >> Also, those 430ms are not invested. The second time will still take
>> 430ms. So
>> >> hashing on a very small bat is never a good investment. On the
>> contrary,
>> >> hashing on a larger (but not too much) table is a good investment. The
>> next
>> >> time a similar query comes in, it will be sub-millisecond.
>> >
>> > Well, this is a trade-off that in in general hard to judge.
>> > If the bigger table / BAT is a base table/BAT, the hash table will
>> (nowadays)
>> > be made persistent and *could* be reused --- whether it indeed will be
>> reused,
>> > we cannot predict. If the bigger table is a transient intermediate
>> result,
>> > re-use is unlikely ...
>> >
>> >
>> > That's fair.
>> >
>> >
>> > Having said that, is your smaller table a base table or an intermediate
>> result
>> > that is (might be) a tiny slice of a large (huge) base table?
>> > Then current code might build the hash on the entire parent BAT rather
>> than on
>> > the tiny slice ...
>> >
>> >
>> > They both are base tables. The tiny table is created and a single
>> insert is
>> > done. The large one is also a regular table, with NOT NULL constraint
>> on the
>> > join column and the entire table is marked read-only.
>> >
>> >
>> >
>> > Also: Which version of MonetDB are we talking about?
>> >
>> >
>> > Oct2014 SP3
>> >
>> >
>> > Stefan
>> >
>> > --
>> >| Stefan.Manegold at CWI.nl | DB Architectures (DA) |
>> >| www.CWI.nl/~manegold/ | Science Park 123 (L321) |
>> >| +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
>> > _______________________________________________
>> > developers-list mailing list
>> > developers-list at monetdb.org
>> > https://www.monetdb.org/mailman/listinfo/developers-list
>> >
>> >
>> > _______________________________________________
>> > developers-list mailing list
>> > developers-list at monetdb.org
>> > https://www.monetdb.org/mailman/listinfo/developers-list
>>
>> --
>> | Stefan.Manegold at CWI.nl | DB Architectures   (DA) |
>> | www.CWI.nl/~manegold/  | Science Park 123 (L321) |
>> | +31 (0)20 592-4212     | 1098 XG Amsterdam  (NL) |
>> _______________________________________________
>> developers-list mailing list
>> developers-list at monetdb.org
>> https://www.monetdb.org/mailman/listinfo/developers-list
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.monetdb.org/pipermail/developers-list/attachments/20150512/8c54dfa6/attachment-0001.html>


More information about the developers-list mailing list