Hashjoin performance with large vs small tables

Roberto Cornacchia roberto.cornacchia at gmail.com
Mon May 11 18:49:53 CEST 2015


>
>
> > Also, those 430ms are not invested. The second time will still take
> 430ms. So
> > hashing on a very small bat is never a good investment. On the contrary,
> > hashing on a larger (but not too much) table is a good investment. The
> next
> > time a similar query comes in, it will be sub-millisecond.
>
> Well, this is a trade-off that in in general hard to judge.
> If the bigger table / BAT is a base table/BAT, the hash table will
> (nowadays)
> be made persistent and *could* be reused --- whether it indeed will be
> reused,
> we cannot predict. If the bigger table is a transient intermediate result,
> re-use is unlikely ...
>
>
That's fair.


> Having said that, is your smaller table a base table or an intermediate
> result
> that is (might be) a tiny slice of a large (huge) base table?
> Then current code might build the hash on the entire parent BAT rather
> than on
> the tiny slice ...
>
>
They both are base tables. The tiny table is created and a single insert is
done. The large one is also a regular table, with NOT NULL constraint on
the join column and the entire table is marked read-only.


> Also: Which version of MonetDB are we talking about?
>
>
Oct2014 SP3


> Stefan
>
> --
> | Stefan.Manegold at CWI.nl | DB Architectures   (DA) |
> | www.CWI.nl/~manegold/  | Science Park 123 (L321) |
> | +31 (0)20 592-4212     | 1098 XG Amsterdam  (NL) |
> _______________________________________________
> developers-list mailing list
> developers-list at monetdb.org
> https://www.monetdb.org/mailman/listinfo/developers-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.monetdb.org/pipermail/developers-list/attachments/20150511/a1faad53/attachment-0001.html>


More information about the developers-list mailing list