Cost Model

Cost models form the basis for many optimization decisions. The cost parameters are typically the size of the (intermediate) results and response time. Alternatively, they are running aggregates, e.g. max memory and total execution time, obtained from a simulated run. The current implementation contains a framework and an example for building your own cost-based optimized.

The optimizer.costModel() works its way through a MAL program in search for relational operators and estimates their result size. The estimated size is left behind as the property rows.

    r{rows=100} := bat.new(:oid,:int);
    s{rows=1000}:= bat.new(:oid,:int);
    rs:= algebra.select(s,1,1);
    rr:= bat.reverse(r);
    j:= algebra.join(rs,rr);
    optimizer.costModel();

changes the properties of the instructions as follows:

    r{rows=100} := bat.new(:oid,:int);
    s{rows=1000} := bat.new(:oid,:int);
    rs{rows=501} := algebra.select(s,1,1);
    rr{rows=100} := bat.reverse(r);
    j{rows=100} := algebra.join(rs,rr);

The cost estimation does not use any statistics on the actual data distribution yet. It relies on the rows property provided by the front-end or other optimizers. It just applies a few heuristic cost estimators. However, it ensures that empty results are only tagged with rows=0 if the estimate is accurate, otherwise it assumes at least one result row. This property makes it possible to safely pass the result of the cost estimation to the emptySet optimizer for code reduction.