Hi Stefan,

Actually, according to my second email, 430 should be compared to 30, not to 0.8. So the difference in speed is "only" 15x.

Still, your question remains. However, even if the two performed equally, the current case would not invest anything..

On 11 May 2015 at 18:40, Stefan Manegold <Stefan.Manegold@cwi.nl> wrote:

Hi Roberto,

thanks for reporting!

This need detailed profiling to understand what's happening.
Why are 16M look-ups to a 1-tuple hash table 500x more expensive
than building a 16M hash table (assuming that is it indeed built on the fly,
i.e., does not already exist), and then performing a single look-up ... ?

Stefan


----- On May 11, 2015, at 6:30 PM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:

> Hi there,
>
> Triggered by a suspiciously slow join, I had a look at BATsubjoin, in particular
> at how it is decided which algorithm to use.
>
> I looked at a couple of cases with a join on strings.
> This is one (pseudo syntax):
>
> l.count ~16M
> l.T.sorted = 0
> l.T.revsorted = 0
>
> r.count = 1
> r.T.sorted = 1
> r.T.revsorted = 1
>
> In this case, a hash table on r is created (because it's smaller).
> This join takes 430ms .
> I forced swapping l and r, thus built the hash table on the larger bat, and then
> it takes 0.8ms .
>
> Notice that though r.count = 1 is a bit of an extreme case, it does happen often
> in practice. And the case with a few tuples would not be very different.
>
> The question I wanted to ask is:
>
> is it always right to build the hash table on the small table?
> Perhaps some more heuristics can help?
> Like: if the smaller is very small, and the larger is not extremely large, then
> hash on the larger one?
>
> Roberto
>
>
> _______________________________________________
> developers-list mailing list
> developers-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/developers-list

--
| Stefan.Manegold@CWI.nl | DB Architectures   (DA) |
| www.CWI.nl/~manegold/  | Science Park 123 (L321) |
| +31 (0)20 592-4212     | 1098 XG Amsterdam  (NL) |
_______________________________________________
developers-list mailing list
developers-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/developers-list