Yes, debug build :/

However, callgrind shows that on the "bad" run (1.6M probes against 1 tuple), the majority of the time is spent on strHash, which contains no assertion.

For now, in attachment the timings I got with the debug build.

Roberto

On 12 May 2015 at 12:30, Stefan Manegold <Stefan.Manegold@cwi.nl> wrote:
Roberto,

do you run a debug or an optimized build?

if the former, try the latter or at least disable assertions.

With a debug build (i.e., with assertions enabled)
you might meassure the costs for (potentially expensive) assertions,
rather than for pure execution ...

Stefan

----- On May 12, 2015, at 11:00 AM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:

> One more detail: I ran all tests with sequential_pipe
>
> On 12 May 2015 at 10:58, Roberto Cornacchia < roberto.cornacchia@gmail.com >
> wrote:
>
>
>
>
>
> On 11 May 2015 at 22:58, Stefan Manegold < Stefan.Manegold@cwi.nl > wrote:
>
>
> Roberto,
>
> just to recap all facts:
>
> - MonetDB Oct2014-SP3
>
> - equality join between 2 string-BATs
>
> - both BATs are persistent "base" BATs
>
> Correct
>
>
>
> - larger BAT has 16 M BUNs
>
>
> Apologies, I had misread the count here. It is 1.6M
> But this doesn't change things.
>
>
>
> - smaller BAT has 1 BUN
>
> - when (forcefully) building the hash on the larger one,
> and then performing a single probe from the smaller one,
> the first ("cold") join takes 30 ms (building the hash table),
> while any next ("hot") one takes 0.8 ms (re-using the pre-built hash table).
> This suggests ~29.2 ms for building the 16 M hash table
> and 0.8 ms for a single probe into that hash table.
>
>
> Correct (except 16M -> 1.6M)
>
>
> - when building the 1 BUN hash table on the smaller one,
> and then performing 16 M probes from the larger one,
> the (first?) ("cold"?) join takes 430 ms?
>
>
> Correct (except 16M -> 1.6M)
>
>
> + How long does a subsequent ("hot") join take (re-using the pre-built hash
> table)?
>
>
> Exactly the same, as expected. The pre-built hash table on the 1-tuple bat can
> hardly be useful
>
>
>
> Could you run detailed profiling (e.g., using valgrind/callgrind) to analyze
> where
> the time goes in all 4 cases (hash on larger vs. hash on smaller & "cold" vs.
> "hot")?
>
>
> Could you share your data to reproduce and analyze the problem?
>
>
> I'm sending data and profiling by email.
> Thank you.
>
>
>
>
> Thanks!
>
> Stefan
>
>
> ----- On May 11, 2015, at 6:49 PM, Roberto Cornacchia
> roberto.cornacchia@gmail.com wrote:
>
>>> Also, those 430ms are not invested. The second time will still take 430ms. So
>>> hashing on a very small bat is never a good investment. On the contrary,
>>> hashing on a larger (but not too much) table is a good investment. The next
>>> time a similar query comes in, it will be sub-millisecond.
>>
>> Well, this is a trade-off that in in general hard to judge.
>> If the bigger table / BAT is a base table/BAT, the hash table will (nowadays)
>> be made persistent and *could* be reused --- whether it indeed will be reused,
>> we cannot predict. If the bigger table is a transient intermediate result,
>> re-use is unlikely ...
>>
>>
>> That's fair.
>>
>>
>> Having said that, is your smaller table a base table or an intermediate result
>> that is (might be) a tiny slice of a large (huge) base table?
>> Then current code might build the hash on the entire parent BAT rather than on
>> the tiny slice ...
>>
>>
>> They both are base tables. The tiny table is created and a single insert is
>> done. The large one is also a regular table, with NOT NULL constraint on the
>> join column and the entire table is marked read-only.
>>
>>
>>
>> Also: Which version of MonetDB are we talking about?
>>
>>
>> Oct2014 SP3
>>
>>
>> Stefan
>>
>> --
>>| Stefan.Manegold@CWI.nl | DB Architectures (DA) |
>>| www.CWI.nl/~manegold/ | Science Park 123 (L321) |
>>| +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
>> _______________________________________________
>> developers-list mailing list
>> developers-list@monetdb.org
>> https://www.monetdb.org/mailman/listinfo/developers-list
>>
>>
>> _______________________________________________
>> developers-list mailing list
>> developers-list@monetdb.org
>> https://www.monetdb.org/mailman/listinfo/developers-list
>
> --
>| Stefan.Manegold@CWI.nl | DB Architectures (DA) |
>| www.CWI.nl/~manegold/ | Science Park 123 (L321) |
>| +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
> _______________________________________________
> developers-list mailing list
> developers-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/developers-list
>
>
>
> _______________________________________________
> developers-list mailing list
> developers-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/developers-list

--
| Stefan.Manegold@CWI.nl | DB Architectures   (DA) |
| www.CWI.nl/~manegold/  | Science Park 123 (L321) |
| +31 (0)20 592-4212     | 1098 XG Amsterdam  (NL) |
_______________________________________________
developers-list mailing list
developers-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/developers-list