Optimizer Pipelines

Optimizer Pipelines mk Sat, 06/05/2010 - 08:20

The code produced by MonetDB/SQL is massaged by an optimizer pipeline.  The pipeline to be used is identified by the SQL global session variable optimizer, which can be modified using a SQL assignment, e.g. the minimal optimizer pipeline is set using:

sql> select optimizer;
+--------------+
| single value |
+==============+
| default_pipe |
+--------------+
sql> set optimizer='minimal_pipe';
sql> select optimizer;
+--------------+
| single value |
+==============+
| minimal_pipe |
+--------------+

Each pipeline consists of a sequence of MAL function calls that inspect and transform the plan for better execution. The preferred optimizer can be changed by either providing its name or the complete MAL function sequence.  For example the minimal pipe can be set also using:

sql> set optimizer='optimizer.inline();optimizer.remap();optimizer.deadcode();optimizer.multiplex();
optimizer.generator();optimizer.profiler();optimizer.candidates();optimizer.garbageCollector();';

The final result of the optimizer steps becomes visible using the SQL EXPLAIN statement modifier. Alternatively, the SQL DEBUG statement modifier in combination with the 'o' command provides access to the intermediate optimizer results.

Several optimizer pipelines are pre-defined in the MonetDB sources. They represent ongoing development activities and experimental code. Their scope of applicability and/or stability has not reached a satisfactory level to include it in the default pipeline. The predefined optimizer pipelines can be inspected as follows:

sql> select * from sys.optimizers;

name description & definition status
minimal_pipe The minimal pipeline necessary by the server to operate correctly. stable
  optimizer.inline();optimizer.remap();optimizer.deadcode();optimizer.multiplex();optimizer.generator();
optimizer.profiler();optimizer.candidates();optimizer.garbageCollector();
default_pipe The default pipeline contains the mitosis-mergetable-reorder optimizers, aimed at large tables and improved access locality. stable
  optimizer.inline();optimizer.remap();optimizer.costModel();optimizer.coercions();optimizer.aliases();
optimizer.evaluate();optimizer.emptybind();optimizer.pushselect();optimizer.aliases();optimizer.mitosis();
optimizer.mergetable();optimizer.deadcode();optimizer.aliases();optimizer.constants();optimizer.commonTerms();
optimizer.projectionpath();optimizer.deadcode();optimizer.reorder();optimizer.matpack();optimizer.dataflow();
optimizer.querylog();optimizer.multiplex();optimizer.generator();optimizer.profiler();optimizer.candidates();
optimizer.deadcode();optimizer.postfix();optimizer.wlc();optimizer.garbageCollector();
oltp_pipe The pipeline optimised for online transaction processing. stable
  optimizer.inline();optimizer.remap();optimizer.costModel();optimizer.coercions();optimizer.evaluate();
optimizer.emptybind();optimizer.pushselect();optimizer.aliases();optimizer.mitosis();optimizer.mergetable();
optimizer.deadcode();optimizer.aliases();optimizer.constants();optimizer.commonTerms();
optimizer.projectionpath();optimizer.deadcode();optimizer.reorder();optimizer.matpack();optimizer.dataflow();
optimizer.querylog();optimizer.multiplex();optimizer.generator();optimizer.profiler();optimizer.candidates();
optimizer.deadcode();optimizer.postfix();optimizer.oltp();optimizer.wlc();optimizer.garbageCollector();
volcano_pipe The pipeline to optimize queries for volcano style processing. stable
  optimizer.inline();optimizer.remap();optimizer.costModel();optimizer.coercions();optimizer.aliases();
optimizer.evaluate();optimizer.emptybind();optimizer.pushselect();optimizer.aliases();optimizer.mitosis();
optimizer.mergetable();optimizer.deadcode();optimizer.aliases();optimizer.constants();optimizer.commonTerms();
optimizer.projectionpath();optimizer.deadcode();optimizer.reorder();optimizer.matpack();optimizer.dataflow();
optimizer.querylog();optimizer.multiplex();optimizer.generator();optimizer.volcano();optimizer.profiler();
optimizer.candidates();optimizer.deadcode();optimizer.postfix();optimizer.wlc();optimizer.garbageCollector();
no_mitosis_pipe The no_mitosis pipeline is identical to the default pipeline, except that optimizer mitosis is omitted. It is used mainly to make some tests work deterministically, and to check/debug whether "unexpected" problems are related to mitosis (and/or mergetable). stable
  optimizer.inline();optimizer.remap();optimizer.costModel();optimizer.coercions();optimizer.aliases();
optimizer.evaluate();optimizer.emptybind();optimizer.pushselect();optimizer.aliases();optimizer.mergetable();
optimizer.deadcode();optimizer.aliases();optimizer.constants();optimizer.commonTerms();
optimizer.projectionpath();optimizer.deadcode();optimizer.reorder();optimizer.matpack();optimizer.dataflow();
optimizer.querylog();optimizer.multiplex();optimizer.generator();optimizer.profiler();optimizer.candidates();
optimizer.deadcode();optimizer.postfix();optimizer.wlc();optimizer.garbageCollector();
sequential_pipe The sequential pipeline is identical to the default pipeline, except that optimizers mitosis & dataflow are omitted. It is used mainly to make some tests work deterministically, i.e., avoid ambigious output, by avoiding parallelism. stable
  optimizer.inline();optimizer.remap();optimizer.costModel();optimizer.coercions();optimizer.aliases();
optimizer.evaluate();optimizer.emptybind();optimizer.pushselect();optimizer.aliases();optimizer.mergetable();
optimizer.deadcode();optimizer.aliases();optimizer.constants();optimizer.commonTerms();
optimizer.projectionpath();optimizer.deadcode();optimizer.reorder();optimizer.matpack();
optimizer.querylog();optimizer.multiplex();optimizer.generator();optimizer.profiler();optimizer.candidates();
optimizer.deadcode();optimizer.postfix();optimizer.wlc();optimizer.garbageCollector();

 

The default SQL optimization pipe line contains the following steps:

inline Inline functions identified as such.
remap Locate hardwired multiplex operations.
costModel Inspects the SQL catalog for size information.
coercions Performs static type coercions.
aliases Remove alias assignments.
evaluate Evaluate constant expressions.
emptybind Removes empty bindings
pushselect Push selections through e.g. joins.
aliases Remove alias assignments.
mitosis Horizontal fragmentation.
mergetable Expand horizontal fragmented plans.
deadcode Remove all code not leading to used results.
aliases Remove alias assignments.
constants Evaluate constant expressions.
commonterms Remove duplicate expressions.
projectionpath Searchs multiple joins and glues them together for better runtime optimization.
deadcode Remove all code not leading to used results.
reorder Reorder the execution order.
matpack Recombine partitions into a single BAT.
dataflow item Prepare code for multi-core execution
querylog Keep track of SQL query definitions and invocations.
multiplex Expand all remaining multiplex operations to iterators.
generator Implement for-loop generator as table producing function.
profiler Highlight the candidate lists in the code.
candidates Highlight the candidate lists in the code.
deadcode Remove all code not leading to used results.
postfix Post fix.
wlc Workload Capture.
garbageCollector Injects calls to the garbage collector to free up space.

 

A user-defined optimizer pipeline is checked against the dependency information maintained in the optimizer library to ensure there are no conflicts and at least the pre-requisite optimizers are used.