On 12 May 2015 at 10:58, Roberto Cornacchia <roberto.cornacchia@gmail.com> wrote:

On 11 May 2015 at 22:58, Stefan Manegold <Stefan.Manegold@cwi.nl> wrote:
Roberto,

just to recap all facts:

- MonetDB Oct2014-SP3

- equality join between 2 string-BATs

- both BATs are persistent "base" BATs

Correct

- larger BAT has 16 M BUNs

Apologies, I had misread the count here. It is 1.6M
But this doesn't change things.

- smaller BAT has 1 BUN

- when (forcefully) building the hash on the larger one,
and then performing a single probe from the smaller one,
the first ("cold") join takes 30 ms (building the hash table),
while any next ("hot") one takes 0.8 ms (re-using the pre-built hash table).
This suggests ~29.2 ms for building the 16 M hash table
and 0.8 ms for a single probe into that hash table.

Correct (except 16M -> 1.6M)

- when building the 1 BUN hash table on the smaller one,
and then performing 16 M probes from the larger one,
the (first?) ("cold"?) join takes 430 ms?

Correct (except 16M -> 1.6M)

+ How long does a subsequent ("hot") join take (re-using the pre-built hash table)?

Exactly the same, as expected. The pre-built hash table on the 1-tuple bat can hardly be useful

Could you run detailed profiling (e.g., using valgrind/callgrind) to analyze where
the time goes in all 4 cases (hash on larger vs. hash on smaller & "cold" vs. "hot")?
Could you share your data to reproduce and analyze the problem?

I'm sending data and profiling by email.
Thank you.

Thanks!

Stefan

----- On May 11, 2015, at 6:49 PM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:

>> Also, those 430ms are not invested. The second time will still take 430ms. So
>> hashing on a very small bat is never a good investment. On the contrary,
>> hashing on a larger (but not too much) table is a good investment. The next
>> time a similar query comes in, it will be sub-millisecond.
>
> Well, this is a trade-off that in in general hard to judge.
> If the bigger table / BAT is a base table/BAT, the hash table will (nowadays)
> be made persistent and *could* be reused --- whether it indeed will be reused,
> we cannot predict. If the bigger table is a transient intermediate result,
> re-use is unlikely ...
>
>
> That's fair.
>
>
> Having said that, is your smaller table a base table or an intermediate result
> that is (might be) a tiny slice of a large (huge) base table?
> Then current code might build the hash on the entire parent BAT rather than on
> the tiny slice ...
>
>
> They both are base tables. The tiny table is created and a single insert is
> done. The large one is also a regular table, with NOT NULL constraint on the
> join column and the entire table is marked read-only.
>
>
>
> Also: Which version of MonetDB are we talking about?
>
>
> Oct2014 SP3
>
>
> Stefan
>
> --
>| Stefan.Manegold@CWI.nl | DB Architectures (DA) |
>| www.CWI.nl/~manegold/ | Science Park 123 (L321) |
>| +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
> _______________________________________________
> developers-list mailing list
> developers-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/developers-list
>
>
> _______________________________________________
> developers-list mailing list
> developers-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/developers-list

--
| Stefan.Manegold@CWI.nl | DB Architectures (DA) |
| www.CWI.nl/~manegold/ | Science Park 123 (L321) |
| +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
_______________________________________________
developers-list mailing list
developers-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/developers-list