hash join - why hashing on the smaller bat?
roberto.cornacchia at gmail.com
Thu Apr 21 18:21:55 CEST 2016
Related to my previous question about persisting hashes, I would like to
throw another one.
BATsubjoin has a series of heuristics to decide what type of join
implementation to use. When using hash-join, the latest rule says: if
nothing else applied, build a hash on the smaller bat.
Could you tell me what is the rationale for this?
>From what I could verify:
- when sizes are comparable: it doesn't really make much difference which
side is hashed
- when sizes differ much: sure, building the hash table on that is much
cheaper, but the join as a whole becomes 4-5 times slower then when hashing
on the larger bat.
In which case hashing on the larger bat is a good option?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the users-list