The routine optimizer.joinPath() walks through the program looking for join operations and cascades them into multiple join paths. To illustrate, consider
a:= bat.new(:oid,:oid);
b:= bat.new(:oid,:oid);
c:= bat.new(:oid,:str);
j1:= algebra.join(a,b);
j2:= algebra.join(j1,c);
j3:= algebra.join(b,b);
j4:= algebra.join(b,j3);
The optimizer will first replace all arguments by their join sequence. The underlying instructions are left behind for the deadcode optimizer to be removed.
a:= bat.new(:oid,:oid);
j1:= algebra.join(a,b);
j2:= algebra.joinPath(a,b,c);
j3:= algebra.join(b,b);
j4:= algebra.joinPath(b,b,b);
The same operation is applied to sequences of leftjoin operations.
Any join result used multiple times should be calculated once. A list of inlined paths is kept around to detect those.
In principle, the joinpaths may contain common subpaths, whose materialization could improve performance.
The SQL front-end produces often produces snippets of the following structure
t1:= algebra.join(b,c);
z1:= algebra.join(a,t1);
...
t2:= algebra.join(b,d);
z2:= algebra.join(a,t2);
The joinpath would merge them into
z1:= algebra.joinPath(a,b,c);
...
z2:= algebra.joinPath(a,b,d);
which are handle by a heuristic looking at the first two argments and re-uses a materialized join.
x13:= algebra.join(a,b);
z1:= algebra.join(x13,c);
...
z2:= algebra.join(x13,d);
An alternative is to make recognition of any common re-useable paths an integral part of the joinPath body.
x3:= algebra.join(a,b);
r3:= bat.reverse(x3);
j1:= join(c,r3);
rb:= bat.reverse(b);
ra:= bat.reverse(a);
j1:= algebra.joinpath(c,rb,ra);
As a final step in the speed up of the joinpath we consider clustering large operands if that is expected to improve IO behavior.