Hashjoin performance with large vs small tables

Roberto Cornacchia roberto.cornacchia at gmail.com
Mon May 11 18:36:04 CEST 2015


This join takes *430ms*.
> I forced swapping l and r, thus built the hash table on the larger bat,
> and then it takes *0.8ms*.

It takes 0.8ms the second time.
The first time, it needs to create the hash table, and then it takes about
Still, much better than 430ms.

Also, those 430ms are not invested. The second time will still take 430ms.
So hashing on a very small bat is never a good investment. On the contrary,
hashing on a larger (but not too much) table is a good investment. The next
time a similar query comes in, it will be sub-millisecond.
