Yes, debug build :/
However, callgrind shows that on the "bad" run (1.6M probes against 1 tuple), the majority of the time is spent on strHash, which contains no assertion.
For now, in attachment the timings I got with the debug build.
Roberto
On 12 May 2015 at 12:30, Stefan Manegold Stefan.Manegold@cwi.nl wrote:
Roberto,
do you run a debug or an optimized build?
if the former, try the latter or at least disable assertions.
With a debug build (i.e., with assertions enabled) you might meassure the costs for (potentially expensive) assertions, rather than for pure execution ...
Stefan
----- On May 12, 2015, at 11:00 AM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:
One more detail: I ran all tests with sequential_pipe
On 12 May 2015 at 10:58, Roberto Cornacchia <
roberto.cornacchia@gmail.com >
wrote:
On 11 May 2015 at 22:58, Stefan Manegold < Stefan.Manegold@cwi.nl >
wrote:
Roberto,
just to recap all facts:
MonetDB Oct2014-SP3
equality join between 2 string-BATs
both BATs are persistent "base" BATs
Correct
- larger BAT has 16 M BUNs
Apologies, I had misread the count here. It is 1.6M But this doesn't change things.
smaller BAT has 1 BUN
when (forcefully) building the hash on the larger one,
and then performing a single probe from the smaller one, the first ("cold") join takes 30 ms (building the hash table), while any next ("hot") one takes 0.8 ms (re-using the pre-built hash
table).
This suggests ~29.2 ms for building the 16 M hash table and 0.8 ms for a single probe into that hash table.
Correct (except 16M -> 1.6M)
- when building the 1 BUN hash table on the smaller one,
and then performing 16 M probes from the larger one, the (first?) ("cold"?) join takes 430 ms?
Correct (except 16M -> 1.6M)
- How long does a subsequent ("hot") join take (re-using the pre-built
hash
table)?
Exactly the same, as expected. The pre-built hash table on the 1-tuple
bat can
hardly be useful
Could you run detailed profiling (e.g., using valgrind/callgrind) to
analyze
where the time goes in all 4 cases (hash on larger vs. hash on smaller &
"cold" vs.
"hot")?
Could you share your data to reproduce and analyze the problem?
I'm sending data and profiling by email. Thank you.
Thanks!
Stefan
----- On May 11, 2015, at 6:49 PM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:
Also, those 430ms are not invested. The second time will still take
430ms. So
hashing on a very small bat is never a good investment. On the
contrary,
hashing on a larger (but not too much) table is a good investment. The
next
time a similar query comes in, it will be sub-millisecond.
Well, this is a trade-off that in in general hard to judge. If the bigger table / BAT is a base table/BAT, the hash table will
(nowadays)
be made persistent and *could* be reused --- whether it indeed will be
reused,
we cannot predict. If the bigger table is a transient intermediate
result,
re-use is unlikely ...
That's fair.
Having said that, is your smaller table a base table or an intermediate
result
that is (might be) a tiny slice of a large (huge) base table? Then current code might build the hash on the entire parent BAT rather
than on
the tiny slice ...
They both are base tables. The tiny table is created and a single
insert is
done. The large one is also a regular table, with NOT NULL constraint
on the
join column and the entire table is marked read-only.
Also: Which version of MonetDB are we talking about?
Oct2014 SP3
Stefan
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list