Join paths

Join paths mk Tue, 03/30/2010 - 16:10

5.2.15 Join Paths

The routine optimizer.joinPath() walks through the program looking for join operations and cascades them into multiple join paths. To illustrate, consider

	a:= bat.new(:oid,:oid);
	b:= bat.new(:oid,:oid);
	c:= bat.new(:oid,:str);
	j1:= algebra.join(a,b);
	j2:= algebra.join(j1,c);
	j3:= algebra.join(b,b);
	j4:= algebra.join(b,j3);

The optimizer will first replace all arguments by their join sequence. The underlying instructions are left behind for the deadcode optimizer to be removed.

	a:= bat.new(:oid,:oid);
	j1:= algebra.join(a,b);
	j2:= algebra.joinPath(a,b,c);
	j3:= algebra.join(b,b);
	j4:= algebra.joinPath(b,b,b);

The same operation is applied to sequences of leftjoin operations.

Any join result used multiple times should be calculated once. A list of inlined paths is kept around to detect those.

In principle, the joinpaths may contain common subpaths, whose materialization would improve performance.

The SQL front-end produces often produces snippets of the following structure

    t1:= algebra.join(b,c);
    z1:= algebra.join(a,t1);
	...
    t2:= algebra.join(b,d);
    z2:= algebra.join(a,t2);

The joinpath would merge them into

    z1:= algebra.joinPath(a,b,c);
	...
    z2:= algebra.joinPath(a,b,d);

which are handle by a heuristic looking at the first two argments and re-uses a materialized join.

    _13:= algebra.join(a,b);
    z1:= algebra.join(_13,c);
	...
    z2:= algebra.join(_13,d);

An alternative is to make recognition of any common re-useable paths an integral part of the joinPath body.

    x3:= algebra.join(a,b);
	r3:= bat.reverse(x3);
	j1:= join(c,r3);
	rb:= bat.reverse(b);
	ra:= bat.reverse(a);
	j1:= algebra.joinpath(c,rb,ra);

As a final step in the speed up of the joinpath we consider clustering large operands if that is expected to improve IO behavior.