Join Paths

The routine optimizer.joinPath() walks through the program looking for join operations and cascades them into multiple join paths. To illustrate, consider

	a:= bat.new(:oid,:oid);
	b:= bat.new(:oid,:oid);
	c:= bat.new(:oid,:str);
	j1:= algebra.join(a,b);
	j2:= algebra.join(j1,c);
	j3:= algebra.join(b,b);
	j4:= algebra.join(b,j3);

The optimizer will first replace all arguments by their join sequence. The underlying instructions are left behind for the deadcode optimizer to be removed.

	a:= bat.new(:oid,:oid);
	j1:= algebra.join(a,b);
	j2:= algebra.joinPath(a,b,c);
	j3:= algebra.join(b,b);
	j4:= algebra.joinPath(b,b,b);

The same operation is applied to sequences of leftjoin operations.

Any join result used multiple times should be calculated once. A list of inlined paths is kept around to detect those.

In principle, the joinpaths may contain common subpaths, whose materialization could improve performance.

The SQL front-end produces often produces snippets of the following structure

	t1:= algebra.join(b,c);
	z1:= algebra.join(a,t1);
	...
	t2:= algebra.join(b,d);
	z2:= algebra.join(a,t2);

The joinpath would merge them into

	z1:= algebra.joinPath(a,b,c);
	...
	z2:= algebra.joinPath(a,b,d);

which are handle by a heuristic looking at the first two argments and re-uses a materialized join.

	x13:= algebra.join(a,b);
	z1:= algebra.join(x13,c);
	...
	z2:= algebra.join(x13,d);

An alternative is to make recognition of any common re-useable paths an integral part of the joinPath body.

	x3:= algebra.join(a,b);
	r3:= bat.reverse(x3);
	j1:= join(c,r3);
	rb:= bat.reverse(b);
	ra:= bat.reverse(a);
	j1:= algebra.joinpath(c,rb,ra);

As a final step in the speed up of the joinpath we consider clustering large operands if that is expected to improve IO behavior.