Skip to main content

Join Paths

5.2.15 Join Paths

The routine optimizer.joinPath() walks through the program looking for join operations and cascades them into multiple join paths. To illustrate, consider

	j1:= algebra.join(a,b);
	j2:= algebra.join(j1,c);
	j3:= algebra.join(b,b);
	j4:= algebra.join(b,j3);

The optimizer will first replace all arguments by their join sequence. The underlying instructions are left behind for the deadcode optimizer to be removed.

	j1:= algebra.join(a,b);
	j2:= algebra.joinPath(a,b,c);
	j3:= algebra.join(b,b);
	j4:= algebra.joinPath(b,b,b);

The same operation is applied to sequences of leftjoin operations.

Any join result used multiple times should be calculated once. A list of inlined paths is kept around to detect those.

In principle, the joinpaths may contain common subpaths, whose materialization would improve performance.

The SQL front-end produces often produces snippets of the following structure

    t1:= algebra.join(b,c);
    z1:= algebra.join(a,t1);
    t2:= algebra.join(b,d);
    z2:= algebra.join(a,t2);

The joinpath would merge them into

    z1:= algebra.joinPath(a,b,c);
    z2:= algebra.joinPath(a,b,d);

which are handle by a heuristic looking at the first two argments and re-uses a materialized join.

    _13:= algebra.join(a,b);
    z1:= algebra.join(_13,c);
    z2:= algebra.join(_13,d);

An alternative is to make recognition of any common re-useable paths an integral part of the joinPath body.

    x3:= algebra.join(a,b);
	r3:= bat.reverse(x3);
	j1:= join(c,r3);
	rb:= bat.reverse(b);
	ra:= bat.reverse(a);
	j1:= algebra.joinpath(c,rb,ra);

As a final step in the speed up of the joinpath we consider clustering large operands if that is expected to improve IO behavior.