hash join - why hashing on the smaller bat?

Roberto Cornacchia roberto.cornacchia at gmail.com
Thu Apr 21 18:21:55 CEST 2016

Related to my previous question about persisting hashes, I would like to
throw another one.

BATsubjoin has a series of heuristics to decide what type of join
implementation to use. When using hash-join, the latest rule says: if
nothing else applied, build a hash on the smaller bat.

Could you tell me what is the rationale for this?

>From what I could verify:
- when sizes are comparable: it doesn't really make much difference which
side is hashed
- when sizes differ much: sure, building the hash table on that is much
cheaper, but the join as a whole becomes 4-5 times slower then when hashing
on the larger bat.

In which case hashing on the larger bat is a good option?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.monetdb.org/pipermail/users-list/attachments/20160421/1827808a/attachment.html>

More information about the users-list mailing list