MonetDB Internals

 This book describes the architecture and interface to the MonetDB server code.

Architecture overview

General overview. MonetDB belongs to the class of database management systems designed primarilly for datawarehouse environments. They are characterised by large repositories, which are mostly queried for business intelligence decisions. They also appear frequently in the area of science, where observations are collected into a warehouse for subsequent scientific analysis. The design is focussed on bulk processing where ever possible. Aimed at maximal efficient use of the hardware for large scale processing. This design objective is reflected at all levels of the architecture and the functionality offered to the user. Although MonetDB/SQL provides the full-fledged SQL interface, it has not been tuned towards high-volume transaction processing with its accompanying multi-level ACID properties.

Storage model. The storage model deployed in MonetDB is a significant deviation of traditional database systems. It represents relational tables using vertical fragmentation, by  storing each column in a separate {(surrogate,value)} table, called a BAT (Binary Association Table). The left column, often the surrogate or OID (object-identifier), is called the head, and the right column tail.  MonetDB executes a low-level relational algebra called the BAT algebra. Data in execution is always stored in (intermediate) BATs, and even the result of a query is a collection of BATs. BAT storage takes the form of two simple memory arrays, one for the head and one for  the tail column. For variable-width types we split the array into two parts. One part with the concatenated data values and a part with offsets into the former. For small string valued tables the combination implements a value dictionary.

Internally, MonetDB stores columns using memory mapped files. They are optimized for the typical situation that the surrogate column is a densely ascending numerical identifier (0,1,2,..); in which case the head array storage can be kept implicit. Surrogate lookup becomes a fast array index read in the tail. In effect, this use of arrays in virtual memory exploits the fast in-hardware address to disk-block mapping implemented by the  MMU (memory management unit) in a CPU to provide an O(1) positional database lookup mechanism.
From a CPU overhead point of view this compares favorably to B-tree lookup into  slotted pages -- the approach traditionally used in database systems for fast record lookup.

Execution model. The MonetDB kernel is an abstract machine, programmed in the MonetDB Assemblee Language, also called MAL. Each relational algebra operator corresponds to a a MAL instruction.  Each BAT algebra operator maps to a simple MAL instruction, which has zero degrees of freedom: it does not take complex expressions as parameter. Rather, complex expressions are broken into a sequence of BAT algebra operators that each perform a simple operation on an entire column of values (``bulk processing''). This allows the implementation of the BAT algebra to forsake an expression interpreting engine; rather all BAT algebra operations in the implementation map onto simple array operations. For instance, the BAT algebra expression
           R:bat[:oid,:oid]:=select(B:bat[:oid,:int], V:int);
can be implemented at the C code level like:
     for (i = j = 0; i < n; i++)
       if (B.tail[i] == V)
         R.tail[j++] = i;

The BAT algebra operators have the advantage that tight for-loops create high instruction locality which eliminates the instruction cache miss problem. Such simple loops are amenable to compiler optimization (loop pipelining, blocking, strength reduction), and CPU out-of-order speculation.

Software stack MonetDB's query processing scheme is centered around three software layers. The top is formed by the query language parser and a heuristic, language-and data model-specific optimizer to reduce the amount of data produced by intermediates and to exploit catalogue knowledge on join-indices. The output is a logical plan expressed in MAL.

The second tier consists of a collection of optimizer modules, which are assembled into optimization pipelines. The modules provide facilities ranging from symbolic processing  up to just-in-time data distribution and execution. MAL programs are transformed into more efficients ones and sprinkled with resource management directives. The approach breaks with the hitherto omnipresent cost-based optimizers by recognition that not all decisions can be cast together in a single cost formula. Operating on the common binary-relational back-end algebra, these optimizer modules are shared by all front-end data models and query languages.

The third tier, the MAL interpreter, contains the library of highly optimized implementation of the binary relational algebra operators. They maintain properties over the object accessed to gear the selection of subsequent algorithms. For example, the Select operator can benefit both from sorted-ness of the BAT or it may call for a sample to derive the expected sizes. The interface is described in the MAL module sections.

SQL front-end. The BAT storage layer combined with the MAL algebra provides a flexible architecture to accomodate a wide variety of query languages.  The relational front-end decomposes tables by column, in BATs with a dense (non-stored) TID head, and a tail column with values. For each table, a BAT with deleted positions is kept. Delta BATs are designed to delay updates to the main columns, and allow a relatively cheap snapshot isolation mechanism (only the delta BATs are copied). MonetDB/SQL also keeps additional BATs for join indices; and value indices are created on-the-fly.

MAL reference

The primary textual interface to the Monetdb kernel is a simple, assembly-like language, called MAL. The language reflects the virtual machine architecture around the kernel libraries and has been designed for speed of parsing, ease of analysis, and ease of target compilation by query compilers. The language is not meant as a primary programming language, or scripting language. Such use is even discouraged.

Furthermore, a MAL program is considered a specification of intended computation and data flow behavior. It should be understood that its actual evaluation depends on the execution paradigm choosen in a scenario. The program blocks can both be interpreted as ordered sequences of assembler instructions, or as a representation of a data-flow graph that should be resolved in a dataflow driven manner. The language syntax uses a functional style definition of actions and mark those that affect the flow explicitly. Flow of control keywords identify a point to chance the interpretation paradigm and denote a synchronization point.

MAL is the target language for query compilers front-ends. Even simple SQL queries generate a long sequence of MAL instructions. They represent both the administrative actions to ensure binding and transaction control, the flow dependencies to produce the query result, and the steps needed to prepare the result set for delivery to the front-end.

Only when the algebraic structure is too limited (e.g. updates), or the database back-end lacks feasible builtin bulk operators, one has to rely on more detailed flow of control primitives. But even in that case, the basic blocks to be processed by a MAL back-end are considered large, e.g. tens of simple bulk assignment instructions.

The remainder of this chapter provide a concise overview of the language features and illustrative examples

Literals

Literals in MAL follow the lexical conventions of the programming language C. A default type is attached, e.g. the literal 1 is typed as an int value. Likewise, the literal 3.14 is typed flt rather than dbl.

A literal can be coerced to another type by tagging it with a type qualifier, provided a coercion operation is defined. For example, 1:lng marks the literal 1 as of type lng. and "1999-12-10":date creates a date literal.

MonetDB comes with the hardwired types bit, bte, chr, wrd, sht, int, lng, oid, flt, dbl, str and bat, the bat identifier. The kernel code has been optimized to deal with these types efficiently, i.e. without unnecessary function call overheads. In addition, the system supports temporal types date, daytime, time, timestamp, timezone, extensions to deal with IPv4 addresses and URLs using inet, url. This list can be extended with user defined types (See kernel types).

Variables

Variables are denoted by identifers and implicitly defined upon first assignment of a value. They take on a type through a type classifier or inherit it from the context in which they are first used, see MAL Type System.

Variables are organized into two classes, starting with and without an underscore. The latter are reserved as MAL parser temporaries, whose name aligns with an entry in the symbol table. In general they can not be used in MAL programs, but they may become visible in MAL program listings or during debugging.

Temporary target variable act as sinks in the data flow graph.
In this case, an optimizer can group them by underlying type and this way
reduce the symbol table size significantly.

MAL variables are internally represented by their index into
the variable stack. Their default textual representation is '_'<digits>.
Temporary names  are recognized by the parser and an error is
produced if it does not satisfy this condition.

Sharing of the temporary is only permitted if the type
is a priory known and not TYPE_any.

Instruction syntax

A MAL instruction has purposely a simple format. It is syntactically represented by an assignment, where an expression (function call) delivers results to multiple target variables. The assignment patterns recognized are illustrated below.

     (t1,..,t32) := module.fcn(a1,..,a32);
     t1 := module.fcn(a1,..,a32);
     t1 := v1 operator v2;
     t1 := literal;
     (t1,..,tn) := (a1,..,an);

Operators are grouped into user defined modules. Ommission of the module name is interpreter as the user module.

Simple binary arithmetic operations are merely provided as a short-hand, e.g. the expression t:=2+2 is converted directly into t:= calc.+(2,2).

Target variables are optional. The compiler introduces temporary variables to hold the result of the expression upon need. They won't show up when you list the MAL program unless they are used elsewhere.

For parsing simplicity, each instruction should fit on a single line. Comments start with a sharp '#' and continues to the end of the line. They are retained in the internal code representation to ease debugging of compiler generated MAL programs.

The data structure to represent a MAL block is kept simple. It contains a sequence of MAL statements and a symbol table. The MAL instruction record is a code byte string overlaid with the instruction pattern, which contains references into the symbol tables and administrative data for the interpreter.

This method leads to a large allocated block, which can be easily freed. Variable- and statement- block together describe the static part of a MAL procedure. It carries enough information to produce a listing and to aid symbolic debugging.

The MAL syntax is summarized below in extended BNF. Alternative constructs are seperated by | and grouped by parenthesis. Optional parts are marked with square brackets. A repetition is marked with either '+' or '*' to indicate at least once and many times, respectively. Lexical tokens are illustrated in small capitals.

program : (statement ';') *
statement : moduleStmt [helpinfo] | definition [helpinfo]
  | includeStmt | stmt
moduleStmt : module ident | atom ident [':'ident]
includeStmt : include identifier | include string_literal
definition : command header address identifier
  | pattern header address identifier
  | function header statement* end identifier
  | factory header statement* end identifier
helpinfo : comment string_literal
header : [ moduelName '.'] name '(' params ')' result
result : paramType | '(' params ')'
params : binding [',' binding]*
binding : identifier typeName [ props ]
typeName : scalarType | collectionType | ':' any ['_' digit]
scalarType : ':' identifier
collectionType : ':' bat ['[' colType ',' colType ']']
colType : scalarType | anyType
props : '{' property [ ',' property]* '}'
property : identifier [ compare literal]
compare : '=' | '<' | '<=' | '>' | '>=' | '!='
stmt : [tag] varlist [':=' expr ]
tag | return | barrier | catch | leave | redo | raise
varlist : variable | '(' variable [',' variable ] * ')'
variable : identifier [ props ]
expr : fcncall | [factor operator] factor
factor : literal_constant | nil | var
fcncall : moduleName '.' name '(' [args] ')'
args : factor [',' factor]*

Type system

MAL is mostly a strongly typed language. Given the interpretative nature of many of the MAL instructions, when and where type resolution takes place is a critical design issue. Performing it too late, i.e. at each instruction call, leads to performance problems if we derive the same information over and over again. However, many built-in operators have polymorphic typed signatures, so we cannot escape it altogether.

Consider the small illustrative MAL program:

     function sample(nme:str, val:any_1):bit;
c := 2 * 3;
b := bbp.bind(nme); #find a BAT
h := algebra.select(b,val,val);
t := aggr.count(h);
x := io.print(t);
y := io.print(val);
end sample;

The function definition is polymorphic typed on the 2nd argument, it becomes a concrete type upon invocation. The system could attempt a type check, but quickly runs into assumptions that generally do not hold. The first assignment can be type checked during parsing and a symbolic optimizer could even evaluate the expression once. Looking up a BAT in the buffer pool leads to an element :bat[ht,tt] where ht and tt are runtime dependent types, which means that the selection operation can not be type-checked immediately. It is an example of an embedded polypmorphic statement, which requires intervention of the user/optimizer to make the type explicit before the type resolver becomes active. The operation count can be checked, if it is given a BAT argument. This assumes that we can infer that 'h' is indeed a BAT, which requires assurance that algebra.select produces one. However, there are no rules to avoid addition of new operators, or to differentiate among different implementations based on the argument types. Since print(t) contains an undetermined typed argument we should postpone typechecking as well. The last print statement can be checked upon function invocation.

Life becomes really complex if the body contains a loop with variable types. For then we also have to keep track of the original state of the function. Or alternatively, type checking should consider the runtime stack rather than the function definition itself.

These examples give little room to achieve our prime objective, i.e. a fast and early type resolution scheme. Any non-polymorphic function can be type checked and marked type-safe upon completion. Type checking polymorphic functions are post-poned until a concrete type instance is known. It leads to a clone, which can be type checked and is entered into the symbol table.

 

Flow of control

The flow of control within a MAL program block can be changed by tagging a statement with either return, yield, barrier, catch, leave, redo, or exit.

The flow modifiers return and yield mark the end of a call and return one or more results to the calling environment. The return and yield are followed by a target list or an assignment, which is executed first.

The barrier (catch) and exit pair mark a guarded statement block. They may be nested to form a proper hierarchy identified by their primary target variable, also called the control variable.

The leave and redo are conditional flow modifiers. The control variable is used after the assignment statement has been evaluated to decide on the flow-of-control action to be taken. Built-in controls exists for booleans and numeric values. The barrier block is opened when the control variable holds true, when its numeric value >= 0, or when it is a non-empty string. The nil value blocks entry in all cases.

Once inside the barrier you have an option to prematurely leave it at the exit statement or to redo interpretation just after the corresponding barrier statement. Much like 'break' and 'continue' statements in the programming language C. The action is taken when the condition is met.

The exit marks the exit for a block. Its optional assignment can be used to re-initialize the barrier control variables or wrap-up any related administration.

The barrier blocks can be properly nested to form a hierarchy of basic blocks. The control flow within and between blocks is simple enough to deal with during an optimizer stage. The redo and leave statements mark the partial end of a block. Statements within these blocks can be re-arranged according to the data-flow dependencies. The order of partial blocks can not be changed that easily. It depends on the mutual exclusion of the data flows within each partial block.

Common guarded blocks in imperative languages are the for-loop and if-then-else constructs. They can be simulated as follows.

Consider the statement for(i=1;i<10;i++) print(i). The (optimized) MAL block to implement this becomes:

          i:= 1;
     barrier B:= i<10;
          io.print(i);
          i:= i+1;
     redo B:= i<10;
     exit B;

Translation of the statement if(i<1) print("ok"); else print("wrong"); becomes:

         i:=1;
     barrier ifpart:= i<1;
         io.print("ok");
     exit ifpart;
     barrier elsepart:= i>=1;
         io.print("wrong");
     exit elsepart;

Note that both guarded blocks can be interchanged without affecting the outcome. Moreover, neither block would have been entered if the variable happens to be assigned nil.

The primitives are sufficient to model a wide variety of iterators, whose pattern look like:

     barrier i:= M.newIterator(T);
         elm:= M.getElement(T,i);
         ...
         leave i:= M.noMoreElements(T);
         ...
         redo i:= M.hasMoreElements(T);
     exit i:= M.exitIterator(T);

The semantics obeyed by the iterator implementations is as follows. The redo expression updates the target variable i and control proceeds at the first statement after the barrier when the barrier is opened by i. If the barrier could not be re-opened, execution proceeds with the first statement after the redo. Likewise, the leave control statement skips to the exit when the control variable i shows a closed barrier block. Otherwise, it continues with the next instruction. Note, in both failed cases the control variable is possibly changed.

A recurring situation is to iterate over the elements in a BAT. This is supported by an iterator implementation for BATs as follows:

     barrier (idx,hd,tl):= bat.newIterator(B);
         ...
         redo (idx,hd,tl):= bat.hasMoreElements(B);
     exit (ids,hd,tl);

Where idx is an integer to denote the row in the BAT, hd and tl denote values of the current element.

Exceptions

MAL comes with an exception handling mechanism, similar in style as found in modern programming languages. Exceptions are considered rare situations that alter the flow of control to a place where they can be handled. After the exceptional case has been handled the following options exist a) continue where it went wrong, b) retry the failed instruction, c) leave the block where the exception was handled, or d) pass the exception to an enclosing call. The current implementation of the MAL interpreter only supports c) and d).

Exception control

The exception handling keywords are: catch and raise The catch marks a point in the dataflow where an exception raised can be dealt with. Any statement between the point where it is raised and the catch block is ignored. Moreover, the catch ... exit block is ignored when no errors have occurred in the preceeding dataflow structure. Within the catch block, the exception variable can be manipulated without constraints.

An exception message is linked with a exception variable of type string. If this variable is defined in the receiving block, the exception message can be delivered. Otherwise, it implicitly raises the exception in the surrounding scope. The variable ANYexception can be used to catch them irrespective of their class.

After an exception has been dealt with the catch block can be left at the normal exit with the option leave or continue after the failed instruction using a redo. The latter case assumes the caught code block has been able to provide an alternative for the failed instruction. [todo, the redo from failed instruction is not implemented yet]

Both leave and redo are conditional flow of control modifiers, which trigger on a non-empty string variable. An exception raised within a catch-block terminates the function and returns control to the enclosing environment.

The argument to the catch statement is a target list, which holds the exception variables you are interested in.

The snippet below illustrates how an exception raised in the function io.read is caught using the exception variable IOerror. After dealing with it locally, it raises a new exception FATALerror for the enclosing call.

     	io.write("Welcome");
     	...
     catch IOerror:str;
     	print("input error on reading password");
     raise FATALerror:= "Can't handle it";
     exit IOerror;

Since catch is a flow control modifier it can be attached to any assignment statement. This statement is executed whenever there is no exception outstanding, but will be ignored when control is moved to the block otherwise.

Builtin exceptions

The policy implemented in the MAL modules, and recognized by the interpreter, is to return a string value by default. A NULL return value indicates succesful execution; otherwise the string encodes information to analyse the error occurred.

This string pattern is strictly formatted and easy to analyse. It starts with the name of the exception variable to be set, followed by an indication where the exception was raise, i.e. the function name and the program counter, and concludes with specific information needed to interpret and handle the exception.

For example, the exception string 'MALException:Admin.main[2]:address of function missing' denotes an exception raised while typechecking a MAL program.

The exceptions captured within the kernel are marked as 'GDKerror'. At that level there is no knowledge about the MAL context, which makes interpretation difficult for the average programmer. Exceptions in the MAL language layer are denoted by 'MALerror', and query language exceptiosn fall in their own class, e.g. 'SQLerror'. Exceptions can be cascaded to form a trail of exceptions recognized during the exection.

Properties

[THIS FUNCTIONALITY IS PARTLY IMPLEMENTED] The mal_properties module provides a generic scheme to administer property sets and a concise API to manage them. Its design is geared towards support of MAL optimizers, which typically make multiple passes over a program to derive an alternative, better version. Such code-transformations are aided by keeping track of derived information, e.g. the expected size of a temporary result or the alignment property between BATs.

Properties capture part of the state of the system in the form of an simple term expression (name, operator, constant). The property model assumes a namespace built around Identifiers. The operator satisfy the syntax rules for MAL operators. Conditional operators are quite common, e.g. the triple (count, <, 1000) can be used to denote a small table.

The property bearing objects in the MAL setting are variables (symbol table entries). The direct relationship between instructions and a target variable, make it possible to keep the instruction properties in the corresponding target variable.

Variables properties The variables can be extended at any time with a property set. Properties have a scope identical to the scope of the corresponding variable. Ommision of the operator and value turns it into a boolean valued property, whose default value is true.

	b{count=1000,sorted}:= mymodule.action("table");
	name{aligngroup=312} := bbp.take("person_name");
	age{aligngroup=312} := bbp.take("person_age");

The example illustrates a mechanism to maintain alignment information. Such a property is helpful for optimizers to pick an efficient algorithm.

MAL function signatures. A function signature contains a description of the objects it is willing to accept and an indication of the expected result. The arguments can be tagged with properties that hold for the actual arguments. It extends the typing scheme used during compilation/optimization. Likewise, the return values can be tagged with properties that 'at least' exist upon function return.

    function test(b:bat[:oid,:int]{count<1000}):bat[:oid,:int]{sorted}
       #code block
    end test

How to propagate properties? Property inspection and manipulation is strongly linked with the operators of interest. Optimizers continuously inspect and update the properties, while kernel operators should not be bothered with their existence. Property propagation is strongly linked with the actual operator implementation. We examine a few recurring cases.

V:=W; Both V and W should be type compatible, otherwise the compiler will already complain.(Actually, it requires V.type()==W.type() and ~V.isaConstant()) But what happens with all others? What is the property propagation rule for the assignment? Several cases can be distinguished:

I) W has a property P, unknown to V.
II) V has a propery P, unknown to W.
III) V has property P, and W has property Q, P and Q are incompatible.
IV) V and W have a property P, but its value disagrees.

case I). If the variable V was not initialized, we can simply copy or share the properties. Copying might be too expensive, while shareing leads to managing the dependencies. case II) It means that V is re-assigned a value, and depending on its type and properties we may have to 'garbage collect/finalize' it first. Alternatively, it could be interpreted as a property that will hold after assignment which is not part of the right-hand side expression. case III) if P and Q are type compatible, it means an update of the P value. Otherwise, it should generates an exception. case IV) this calls for an update of V.P using the value of W.P. How this should be done is property specific.

Overall, the policy would be to 'disgard' all knowledge from V first and then copy the properties from W.

[Try 1] V:= fcn(A,B,C) and signature fcn(A:int, B:int, C:int):int The signature provides several handles to attach properties. Each formal parameter could come with a list of 'desirable/necessary' properties. Likewise, the return values have a property set. This leads to the extended signature function fcn(A:T,....,B:T): (C:T...D:T) where each Pi denotes a property set. Properties P1..Pn can be used to select the proper function variant [Not yet implemented]. At its worst, several signatures of fcn() should be inspected at runtime to find one with matching properties. To enable analysis and optimization, however, it should be clear that once the function is finished, the properties Pk..Pm exist.

[Try 2] V:= fcn(A,B,C) and signature fcn(A:int,B:int,C:int):int The function is applicable when a (simple conjuntive) predicate over the properties of the actual arguments holds [Not yet implemented]. A side-effect of execution of the function leads to an update of the property set associated with the actual arguments. 

[Try 3] Organize property management by the processor involved, e.g. a cost-based optimizer or a access control enforcer. For each optimizer we should then specify the 'symbolic' effect of execution of instructions of interest. This means 'duplication' of the instruction set.

Can you drop properties? It seems possible, but since property operations occur before actual execution there is no guarantee that they actually take place.

[case: how to handle sort(b:bat):bat as a means to propagate ]
[actually we need an expression language to indicate the propety set, e.g. sort(b:bat):bat which first obtains the properties of b and extends it with sorted. A nested structure emerge

Is it necessary to construct the property list intersection? Or do we need a user defined function to consolidate property lists? ]

Aside, it may be valuable to collect information on e.g. the execution time of functions as a basis for future optimizations. Rather then cluttering the property section, it makes sense to explicitly update this information in a catalog.

3.14 Property ADT implementation

addProperty(O,P) adds property P to the list associated with O. If O represents a compound structure, e.g. a BAT, we should indicate the component as well. For example, addProperty(O,P,Ia,...Ib) introduces a property shared by the components Ia..Ib (indicated with an integer index.

hasProperty(O,P) is a boolean function that merely checks existence hasnotProperty(O,P) is the dual operation.

setProperty(O,P,V) changes the propety value to V. It may raise a PropertyUpdateViolation exception when this can not be realized. Note, the property value itself is changed, not the object referenced.

getProperty(O,P) retrieves the current value of a property. This may involve calling a function or running a database query.

setPropertyAttribute(O,P,A) changes the behavior of the property. For example, the attribute 'freeze' will result in a call to the underlying function only once and to cache the result for the remainder of the objects life time.

3.18 Predefined properties

The MAL language uses a few defaults, recognizable as properties

 

unsafe function has side effects.Default, unsafe=off
read data can be read but not updated
append data can be appended

 

Modules

Describe the set up of modules.

Functions

MAL comes with a standard functional abstraction scheme. Functions are represented by MAL instruction lists, enclosed by a function signature and end statement. The function signature lists the arguments and their types. The end statement marks the end of this sequence. Its argument is the function name.

An illustrative example is:

     function user.helloWorld(msg:str):str;
         io.print(msg);
         msg:= "done";
         return msg;
     end user.helloWorld;

The module name 'user' designates the collection to which this function belongs. A missing module name is considered a reference to the current module, i.e. the last module or atom context openend. All user defined functions are assembled in the module user by default.

The functional abstraction scheme comes with several variations: commands, patterns, and factories. They are discussed shortly.

 

Polymorphism

Polymorphic functions are characterised by type variables denoted by

:any

and an optional index. Each time a polymorphic MAL function is called, the symbol table is first inspected for the matching strongly typed version. If it does not exists, a copy of the MAL program is generated, whereafter the type variables are replaced with their concrete types. The new MAL program is immediately type checked and, if no errors occured, added to the symbol table.

The generic type variable :any designates an unknown type, which may be filled at type resolution time. Unlike indexed polymorphic type arguments, :any type arguments match possibly with different concrete types.

An example of a parameterised function is shown below:

     function user.helloWorld(msg:any_1):any_1;
         io.print(msg);
         return user.helloWorld;
     end helloWorld;

The type variables ensure that the return type equals the argument type. Type variables can be used at any place where a type name is permitted. Beware that polymorphic typed variables are propagated throughout the function body. This may invalidate type resolutions decisions taken earlier (See MAL Type System).

This version of helloWorld can also be used for other arguments types, i.e. bit,sht,lng,flt,dbl,.... For example, calling helloWorld(3.14:flt) echoes a float value.

C-functions

The MAL function body can also be implemented with a C-function. They are introduced to the MAL type checker by providing their signature and an

address qualifier for linkage.

We distinguish both command and pattern C-function blocks. They differ in the information accessible at run time. The command variant calls the underlying C-function, passing pointers to the arguments on the MAL runtime stack. The pattern command is passed pointers to the MAL definition block, the runtime stack, and the instruction itself. It can be used to analyse the types of the arguments directly.

For example, the definitions below link the kernel routine BKCinsert_bun with the function bat.insert(). It does not fully specify the result type. The io.print() pattern applies to any BAT argument list, provided they match on the head column type. Such a polymorphic type list may only be used in the context of a pattern.

     command bat.insert(b:bat[:any_1,:any_2], ht:any_1, tt:any_2)
     	:bat[:any_1,:any_2]
     address BKCinsert_bun;
     
     pattern io.print(b1:bat[:any_1,:any]...):int
     address IOtable;

Factories

A convenient programming construct is the co-routine, which is specified as an ordinary function, but maintains its own state between calls, and permits re-entry other than by the first statement.

The random generator example is used to illustrate its definition and use.

     factory random(seed:int,limit:int):int;
         rnd:=seed;
         lim:= limit;
     barrier lim;
         leave lim:= lim-1;
         rnd:= rnd*125;
         yield rnd:= rnd % 32676;
         redo lim;
     exit lim;
     end random;

The first time this factory is called, a plant is created in the local system to handle the requests. The plant contains the stack frame and synchronizes access.

In this case it initializes the generator. The random number is generated and yield as a result of the call. The factory plant is then put to sleep. The second call received by the factory wakes it up at the point where it went to sleep. In this case it will find a redo statement and produces the next random number. Note that also in this case a seed and limit value are expected, but they are ignored in the body. This factory can be called upon to generate at most 'limit' random numbers using the 'seed' to initialize the generator. Thereafter it is being removed, i.e. reset to the original state.

A cooperative group of factories can be readily constructed. For example, assume we would like the random factories to respond to both random(seed,limit) and random(). This can be defined as follows:

     factory random(seed:int,limit:int):int;
         rnd:=seed;
         lim:= limit;
     barrier lim;
         leave lim:= lim-1;
         rnd:= rnd*125;
         yield rnd:= rnd % 32676;
         redo lim;
     exit lim;
     end random;
     
     factory random():int;
     barrier forever:=true;
         yield random(0,0);
         redo forever;
     exit forever;
     end random;

Example factories

Complex Factories

The factory scheme can be used to model a volcano-style query processor. Each node in the query tree is an iterator that calls upon the operands to produce a chunk, which are combined into a new chunk for consumption of the parent. The prototypical join(R,S) query illustrates it. The plan does not test for all boundary conditions, it merely implements a nested loop. The end of a sequence is identified by a NIL chunk.

     factory query();
         Left:= sql.bind("relationA");
         Right:= sql.bind("relationB");
         rc:= sql.joinStep(Left,Right);
     barrier rc!= nil;
         io.print(rc);
         rc:= sql.joinStep(Left,Right);
         redo rc!= nil;
     exit rc;
     end query;
     
     #nested loop join
     factory sql.joinStep(Left:bat[:any,:any],Right:bat[:any,:any]):bat[:any,:any];
         lc:= bat.chunkStep(Left);
     barrier outer:= lc != nil;
         rc:= bat.chunkStep(Right);
         barrier inner:= rc != nil;
             chunk:= algebra.join(lc,rc);
             yield chunk;
             rc:= bat.chunkStep(Right);
             redo inner:= rc != nil;
         exit inner;
         lc:= bat.chunkStep(Left);
         redo outer:= lc != nil;
     exit outer;
         # we have seen everything
         return nil;
     end joinStep;
     
     #factory for left branch
     factory chunkStepL(L:bat[:any,:any]):bat[:any,:any];
         i:= 0;
         j:= 20;
         cnt:= algebra.count(L);
     barrier outer:= j<cnt;
         chunk:= algebra.slice(L,i,j);
         i:= j;
         j:= i+ 20;
         yield chunk;
         redo loop:= j<cnt;
     exit outer;
         # send last portion
         chunk:= algebra.slice(L,i,cnt);
         yielD chunk;
         return nil;
     end chunkStep;
     
     #factory for right leg
     factory chunkStepR(L:bat[:any,:any]):bat[:any,:any];

So far we haven't re-used the pattern that both legs are identical. This could be modeled by a generic chunk factory. Choosing a new factory for each query steps reduces the administrative overhead.

Materialized Views

An area where factories might be useful are support for materialized views, i.e. the result of a query is retained for ease of access. A simple strategy is to prepare the result once and return it on each successive call. Provided the arguments have not been changed. For example:

     factory view1(l:int, h:int):bat[:oid,:str];
         a:bat[:oid,:int]:= bbp.bind("emp","age");
         b:bat[:oid,:str]:= bbp.bind("emp","name");
     barrier always := true;
         lOld := l;
         hOld := h;
         c := algebra.select(a,l,h);
         d := algebra.semijoin(b,c);
     barrier available := true;
         yield d;
         leave available := calc.!=(lOld,l);
         leave available := calc.!=(hOld,h);
         redo available := true;
     exit available;
         redo always;
     exit always;
     end view1;

The code should be extended to also check validity of the BATs. It requires a check against the last transaction identifier known.

The Factory concept is still rather experimental and many questions should be considered, e.g. What is the lifetime of a factory? Does it persists after all clients has disappeared? What additional control do you need? Can you throw an exception to a Factory?

Ownership

For simple cases, e.g. implementation of a random function, it suffices to ensure that the state is secured between calls. But, in a database context there are multiple clients active. This means we have to be more precise on the relationship between a co-routine and the client for which it works.

The co-routine concept researched in Monet 5 is the notion of a 'factory', which consists of 'factory plants' at possibly different locations and with different policies to handle requests. Factory management is limited to its owner, which is derived from the module in which it is placed. By default Admin is the owner of all modules.

The factory produces elements for multiple clients. Sharing the factory state or even remote processing is up to the factory owner. They are set through properties for the factory plant.

The default policy is to instantiate one shared plant for each factory. If necessary, the factory can keep track of a client list to differentiate the states. A possible implementation would be:

     factory random(seed:int,clientid:int):int;
         clt:= bat.new(:int,:int);
         bat.insert(clt,clientid,seed);
     barrier always:=true;
         rnd:= algebra.find(clt,clientid);
     catch   rnd; #failed to find client
         bat.insert(clt,clientid,seed);
         rnd:= algebra.find(clt,clientid);
     exit    rnd;
         rnd:= rnd * 125;
         rnd:= rnd % 32676;
         algebra.replace(clt,clientid,rnd);
         yield rnd;
         redo always;
     exit always;
     end random;

The operators to built client aware factories are, factories.getCaller(), which returns a client index, factories.getModule() and factories.getFunction(), which returns the identity of scope enclosed.

To illustrate, the client specific random generator can be shielded using the factory:

     factory random(seed:int):int;
     barrier always:=true;
         clientid:= factories.getCaller();
         yield user.random(seed, clientid);
         redo always;
     exit always;
     end random;

Boxed variables

Variable boxes

Clients sessions often come with a global scope of variable settings. Access to these global variables should be easy, but they should also provide protection against concurrent update when the client wishes to perform parallel processing. Likewise, databases, query languages, etc. may define constants and variables accessible, e.g., relational schemas, to a selected user group.

The approach taken is to rely on persistent object spaces as pioniered in Lynda and -later- JavaSpaces. They are called boxes in MonetDB and act as managed containers for persistent variables.

Before a client program can interact with a box, it should open it, passing qualifying authorization information and parameters to instruct the box-manager of the intended use. A built-in box is implicitly opened when you request for its service.

At the end of a session, the box should be closed. Some box-managers may wish to implement a lease-scheme to automatically close interaction with a client when the lease runs out. Likewise, the box can be notified when the last reference to a leased object ceases to exist.

A box can be extended with a new object using the function deposit(name) with name a local variable. The default implementation silently accepts any new definition of the box. If the variable was known already in the box, its value is overwritten.

A local copy of an object can be obtained using the pattern 'take(name,[param])', where name denotes the variable of interest. The type of the receiving variable should match the one known for the object. Whether an actual copy is produced or a reference to a shared object is returned is defined by the box manager.

The object is given back to the box manager calling 'release(name)'. It may update the content of the repository accordingly, release locks, and move the value to persistent store. Whatever the semantics of the box requires. [The default implementation is a no-op]

Finally, the object manager can be requested to 'discard(name)' a variable completely. The default implementation is to reclaim the space in the box.

Concurrency control, replication services, as well as access to remote stores may be delegated to a box manager. Depending on the intended semantics, the box manager may keep track of the clients holding links to this members, provide a traditional 2-phase locking scheme, optimistic control, or check-out/check-in scheme. In all cases, these management issues are transparant to the main thread (=client) of control, which operates on a temporary snapshot. For the time being we realize the managers as critical code sections, i.e. one client is permitted access to the box space at a time.

For example, consider the client function:

     function myfcn():void;
         b:bat[:oid,:int] := bbp.take("mytable");
         c:bat[:int,:str] := sql.take("person","age");
         d:= intersect(b,c);
         io.print(d);
         u:str:= client.take(user);
         io.print(u);
         client.release(user);
     end function;

The function binds to a copy from the local persistent BAT space, much like bat-names are resolved in earlier MonetDB versions. The second statement uses an implementation of take that searches a variable of interest using two string properties. It illustrates that a box manager is free to extend/overload the predefined scheme, which is geared towards storing MAL variables.

The result bat c is temporary and disappears upon garbage collection. The variable u is looked up as the string object user.

Note that BATs b and c need be released at some point. In general this point in time does not coincide with a computational boundary like a function return. During a session, several bats may be taken out of the box, being processed, and only at the end of a session being released. In this example, it means that the reference to b and c is lost at the end of the function (due to garbarge collection) and that subsequent use requires another take() call. The box manager bbp is notified of the implicit release and can take garbage collection actions.

The box may be inspected at several times during a scenario run. The first time is when the MAL program is type-checked for the box operations. Typechecking a take() function is tricky. If the argument is a string literal, the box can be queried directly for the objects' type. If found, its type is matched against the lhs variable. This strategy fails in the situation when at runtime the object is subsequently replaced by another typed-instance in the box. We assume this not to happen and the exceptions it raises a valuable advice to reconsider the programming style.

The type indicator for the destination variable should be provided to proceed with proper type checking. It can resolve overloaded function selection.

Inspection of the Box can be encoded using an iterator at the MAL layer and relying on the functionality of the box. However, to improve introspection, we assume that all box implementations provide a few rudimentary functions, called objects(arglist) and dir(arglist). The function objects() produces a BAT with the object names, possibly limited to those identified by the arglist.

The world of boxes has not been explored deeply yet. It is envisioned that it could play a role to import/export different objects, e.g., introduce xml.take() which converts an XML document to a BAT, jpeg.take() similer for an image.

Nesting boxes is possible. It provides a simple containment scheme between boxes, but in general will interfere with the semantics of each box.

Each box has (should) have an access control list, which names the users having permission to read/write its content. The first one to create the box becomes the owner. He may grant/revoke access to the box to users on a selective basis.

Constant Box

The top level interaction keeps a 'box' with global variables, i.e. each MAL statement is interpreted in an already initialized stack frame. This causes the following problems: 1) how to get rid of global variables and 2) how to deal with variables that can take 'any' type. It is illustrated as follows:

     f:= constant.take("dbname");
     io.print(f);

When executed in the context of a function, the answer will be simple [ nil ]. The reason is that the expected type is not known at compilation time. The correct definition would have been

     f:str:= constant.take("dbname");
     io.print(f);

Session box

Session Box

Aside from box associated with the modules, a session box is created dynamically on behalf of each client. Such boxes are considered private and require access by the user name (and password). At the end of a session they are closed, which means that they are saved in persistent store until the next session starts. For example:

     function m():void;
         box.open("client_name");
         box.deposit("client_name","pi",3.417:flt);
         f:flt := box.take("client_name","pi");
         io.print(t);
         box.close("client_name");
     end function;

In the namespace it is placed subordinate to any space introduced by the system administrator. It will contain global client data, e.g., user, language, database, port, and any other session parameter. The boxes are all collected in the context of the database directory, i.e. the directory <dbfarm>/box


 

MAL interpreter

The MAL interpreter always works in the context of a single user session, which provides for storage access to global variables and modules.

The linkage between MAL interpreter and compiled C-routines is kept as simple as possible. Basically we distinguish four kinds of calling conventions: CMDcall, FCNcall, FACcall, and PATcall. The FCNcall indicates calling a MAL procedure, which leads to a recursive call to the interpreter.

CMDcall initiates calling a linked function, passing pointers to the parameters and result variable, i.e. f(ptr a0,..., ptr aN) The function returns a MAL-SUCCEED upon success and a pointer to an exception string upon failure. Failure leads to raise-ing an exception in the interpreter loop, by either looking up the relevant exception message in the module administration or construction of a standard string.

The PATcall initiates a call which contains the MAL context, i.e. f(MalBlkPtr mb, MalStkPtr stk, InstrPtr pci) The mb provides access to the code definitions. It is primarilly used by routines intended to manipulate the code base itself, such as the optimizers. The Mal stack frame pointer provides access to the values maintained. The arguments passed are offsets into the stack frame rather than pointers to the actual value.

 

MAL runtime stack

The runtime context of a MAL procedure is allocated on the runtime stack of the corresponding interpreter. Access to the elements in the stack are through index offsets, determined during MAL procedure parsing.

The scope administration for MAL procedures is decoupled from their actual runtime behavior. This means we are more relaxed on space allocation, because the size is determined by the number of MAL procedure definitions instead of the runtime calling behavior. (See mal_interpreter for details on value stack management)

The variable names and types are kept in the stack to ease debugging. The underlying string value need not be garbage collected. Runtime storage for variables are allocated on the stack of the interpreter thread. The physical stack is often limited in size, which calls for safeguarding their value and garbage collection before returning. A malicious procedure or implementation will lead to memory leakage.

A system command (linked C-routine) may be interested in extending the stack. This is precluded, because it could interfere with the recursive calling sequence of procedures. To accommodate the (rare) case, the routine should issue an exception to be handled by the interpreter before retrying. All other errors are turned into an exception, followed by continuing at the exception handling block of the MAL procedure.

Debugging

In practice it is hard to write a correct MAL program the first time around. Instead, it is more often constructed by trial-and-error. As long as there are syntax and semantic errors the MAL compiler provides a sufficient handle to proceed. Once it passes the compiler we have to resort to a debugger to assess its behavior.

Note, the MAL debugger described here can be used in conjunction with the textual interface client mclient only. The JDBC protocol does not permit passing through information that 'violates' the protocol.

Attachment

Debugging a running MAL process is simplified with a few hooks in the kernel. It is illustrated with a short example.

First open a client connection using MAL as preferred language. Then the state of the system can be inspected, in particular, the clients active can be looked up.

     mal>b:= clients.getLogins();
     mal>c:= clients.getUsers();
     mal>io.print(b,c);
     #-------------------------------------------------#
     # client        login                           users     # name
     # int   str                             str       # type
     #-------------------------------------------------#
     [ 0,      "Thu Feb  7 15:57:08 2008",     "0"     ]
     [ 1,      "Thu Feb  7 15:57:11 2008",     "0"     ]

Locate the process you are interested in and obtain its identifier, say N (the first column in the list above). The next step is to gracefully put the running process into debugging mode without jeopardizing the application running.

     mal> mdb.setTrap(1);
     #process 1 put to sleep
     mal> mdb.grab();

As soon as the next MAL instruction of process N starts the target process is put to sleep and you can access the context for debugging. The control ends when you leave the debugger with a 'quit' command.

Breakpoints

A powerful mechanism for debugging a program is to set breakpoints during the debugging session. The breakpoints are designated by a target variable name, a [module.]function name, or a MAL line number (#<number>).

The snippet below illustrates the reaction to set a break point on assignment to variable 'i'.

     mal>mdb.start();
     #end main;
     mdb>
     mal>user.test(1);
     #    user.test(1);
     mdb>break i
     breakpoint on 'i' not set
     mdb>n
     #    io.print(i);
     mdb>break i
     mdb>c
     [ 1 ]
     #    i := calc.*(i,2);
     mdb>

The breakpoints remain in effect over multiple function calls. They can be removed with the delete statement. A list of all remaining breakpoints is obtained with breakpoints.

The interpreter can be instructed to call the debugger as soon as an exception is raised. Simply add the instruction mdb.setCatch(true).

Debugger features

To ease debugging and performance monitoring, the MAL interpreter comes with a gdb-like debugger. An illustrative session elicits the functionality offered.

     mal>function test(i:int):str;
     mal>	io.print(i);
     mal>	i:= i*2;
     mal>	b:= bat.new(:int,:int);
     mal>	bat.insert(b,1,i);
     mal>	io.print(b);
     mal>	return test:= "ok";
     mal>end test;
     mal>user.test(1);
     [ 1 ]
     #-----------------#
     # h     t         # name
     # int   int       # type
     #-----------------#
     [ 1,      2       ]

The debugger can be entered at any time using the call mdb.start(). An overview of the available commands is readily available.

     mal>mdb.start();
     #mdb !end main;
     mdb>help
     	next             -- Advance to next statement
     	continue         -- Continue program being debugged
     	catch            -- Catch the next exception
     	break [<var>]    -- set breakpoint on current instruction or <var>
     	delete [<var>]   -- remove break/trace point <var>
     	debug <int>      -- set kernel debugging mask
     	dot <obj> [<file>]  -- generate the dependency graph
     	step             -- advance to next MAL instruction
     	module           -- display a module signatures
     	atom             -- show atom list
     	finish           -- finish current call
     	exit             -- terminate executionr
     	quit             -- turn off debugging
     	list <obj>       -- list current program block
     	List <obj>       -- list with type information
     	span             -- list the life span of variables
     	var  <obj>       -- print symbol table for module
     	optimizer <obj>  -- display optimizer steps
     	print <var>      -- display value of a variable
     	print <var> <cnt>[<first>] -- display BAT chunk
     	info <var>       -- display bat variable properties
     	run              -- restart current procedure
     	where            -- print stack trace
     	down             -- go down the stack
     	up               -- go up the stack
     	trace <var>      -- trace assignment to variables
     	trap <mod>.<fcn> -- catch MAL function call in console
     	set {timer,thread,flow,io,memory,bigfoot} -- set trace switches
     	unset            -- turn off switches
     	help             -- this message
     mdb>

The term <obj> is an abbreviation for a MAL operation <mod>.<fcn>, optionally extended with a version number, i.e. [<nr>]. The var denotes a variable in the current stack frame. Debugger commands may be abbreviated.

We walk our way through a debugging session, highlighting the effects of the debugger commands. The call to mdb.start() has been encapsulated in a complete MAL function, as shown by issuing the list command. A more detailed listing shows the binding to the C-routine and the result of type resolution.

     mal>mdb.start();
     #end main;
     mdb>l
     function user.main():int;
     	mdb.start();
     end main;
     mdb>L
     function user.main():int;       # 0  (main:int)
     	mdb.start();        # 1 MDBstart (_1:void)
     end main;       # 2

The user module is the default place for function defined at the console. The modules loaded can be shown typeing the command 'module' (or 'm' for short). The function signatures become visible using the module and optionally the function name.

     mdb>m alarm
     #command alarm.alarm(secs:int,action:str):void address ALARMsetalarm;
     #command alarm.ctime():str address ALARMctime;
     #command alarm.epilogue():void address ALARMepilogue;
     #command alarm.epoch():int address ALARMepoch;
     #command alarm.prelude():void address ALARMprelude;
     #command alarm.sleep(secs:int):void address ALARMsleep;
     #command alarm.time():int address ALARMtime;
     #command alarm.timers():bat[:str,:str] address ALARMtimers;
     #command alarm.usec():lng address ALARMusec;
     mdb>m alarm.sleep
     #command alarm.sleep(secs:int):void address ALARMsleep;
     mdb>

The debugger mode is left with a <return>. Any subsequent MAL instruction re-activates the debugger to await for commands. The default operation is to step through the execution using the 'next' ('n') or 'step' ('s) commands, as shown below.

     mal>user.test(1);
     #    user.test(1);
     mdb>n
     #    io.print(i);
     mdb>
     [ 1 ]
     #    i := calc.*(i,2);
     mdb>
     #    b := bat.new(:int,:int);
     mdb>

The last instruction shown is next to be executed. The result can be shown using a print statement, which contains the location of the variable on the stack frame, its name, its value and type. The complete stack frame becomes visible with 'values' ('v') command:

     #    bat.insert(b,1,i);
     mdb>
     #    io.print(b);
     mdb>v
     #Stack for 'test' size=32 top=11
     #[0] test        = nil:str
     #[1] i   = 4:int
     #[2] _2  = 0:int   unused
     #[3] _3  = 2:int  constant
     #[4] b   = <tmp_1226>:bat[:int,:int]   count=1 lrefs=1 refs=0
     #[5] _5  = 0:int   type variable
     #[6] _6  = nil:bat[:int,:int]   unused
     #[7] _7  = 1:int  constant
     #[8] _8  = 0:int   unused
     #[9] _9  = "ok":str  constant

The variables marked 'unused' have been introduced as temporary variables, but which are not referenced in the remainder of the program. It also illustrates basic BAT properties, a complete description of which can be obtained using the 'info' ('i') command. A sample of the BAT content can be printed passing tuple indices, e.g. 'print b 10 10' prints the second batch of ten tuples.

Inspection

The debugger commands available for inspection of the program and symbol tables are:

list (List) [<mod>.<fcn>['['<nr>']']]
A listing of the current MAL block, or one designated by the <mod>.<fcn> is produced. The [<nr>] extension provides access to an element in the MAL block history. The alternative name 'List' also produces the type information.
optimizer [<mod>.<fcn>['['<nr>']']]
Gives an overview of the optimizer actions in the history of the MAL block. Intermediate results can be accessed using the list command.
atoms
Lists the atoms currently known
modules [<mod>]
Lists the modules currently known. An optional <mod> argument produces a list of all signatures within the module identified.
dot <mod>.<fcn>['['<nr>']'] [<file>]
A dataflow diagram can be produced using the dot command. It expects a function identifier with an optional history index and produces a file for the Linux program dot, which can produce a nice, multi-page graph to illustrate plan complexity.
     mdb>dot user.main

This example produces the user.main.dot in the current working directory. The program call

     dot -Tpdf user-tst-0.dot -o user-tst-0.pdf

creates a PDF file with the graphs. The Linux pdfposter utility can be used to produce a proper printing. Alternatively, the Adobe reader professional can break it up into multiple pages.

Runtime status

Part of the debugger functionality can also be used directly with MAL instructions. The execution trace of a snippet of code can be visualized encapsulation with mdb.setTrace(true) and mdb.setTrace(false). Likewise, the performance can be monitored with the command mdb.setTimer(on/off). Using a boolean argument makes it easy to control the (performance) trace at run time. The following snippet shows the effect of patching the test case.

     mal>function test(i:int):str;
     mal>	mdb.setTrace(true);
     mal>	io.print(i);
     mal>	i:= i*2;
     mal>	b:= bat.new(:int,:int);
     mal>	bat.insert(b,1,i);
     mal>	io.print(b);
     mal>	mdb.setTrace(false);
     mal>	return test:= "ok";
     mal>end test;
     mal>user.test(1);
     #    mdb.setTrace(_3=true)
     [ 1 ]
     #    io.print(i=1)
     #    i := calc.*(i=2, _5=2)
     #    b := bat.new(_7=0, _8=0)
     #    bat.insert(b=<tmp_1226>, _10=1, i=2)
     #-----------------#
     # h     t         # name
     # int   int       # type
     #-----------------#
     [ 1,      2       ]
     #    io.print(b=<tmp_1226>)
     #   261 usec!    user.test(_2=1)
     mal>

The command mdb.setTimer() toggles the performance traceing flag. The argument is a boolen to designate its state. The primary output of the timer switch is statistics in micro-seconds, the memory tracer shows the arena increment, and the IO tracer shows in- and out-blocks. The time spent on preparing the trace information is excluded from the report. For more detailed timing information the Linux tool valgrind may be of help.

The routines mdb.setFlow(), mdb.setMemory(), and mdb.setIO() (de-)activate the other switches.

     mal>function test(i:int):str;
     mal>	mdb.setTimer(true);
     mal>	io.print(i);
     mal>	i:= i*2;
     mal>	b:= bat.new(:int,:int);
     mal>	bat.insert(b,1,i);
     mal>	io.print(b);
     mal>	mdb.setTimer(false);
     mal>	return test:= "ok";
     mal>end test;
     mal>user.test(1);
     #     6 usec#    mdb.setTimer(_3=true)
     [ 1 ]
     #    43 usec#    io.print(i=1)
     #     5 usec#    i := calc.*(i=2, _5=2)
     #    24 usec#    b := bat.new(_7=0, _8=0)
     #    10 usec#    bat.insert(b=<tmp_1226>, _10=1, i=2)
     #-----------------#
     # h     t         # name
     # int   int       # type
     #-----------------#
     [ 1,      2       ]
     #   172 usec#    io.print(b=<tmp_1226>)
     #   261 usec#    user.test(_2=1)

It is also possible to activate the debugger from within a program using mdb.start(). It remains in this mode until you either issue a quit command, or the command mdb.stop() instruction is encountered. The debugger is only activated when the user can direct its execution from the client interface. Otherwise, there is no proper input channel and the debugger will run in trace mode.

The program listing functionality of the debugger is also captured in the MAL debugger module. The current code block can be listed using mdb.list() and mdb.List(). An arbitrary code block can be shown with mdb.list(module,function) and mdb.List(module,function). A BAT representation of the current function is return by mdb.getDefinition().

The symbol table and stack content, if available, can be shown with the operations mdb.var() and mdb.list(module,function) Access to the stack frames may be helpful in the context of exception handling. The operation mdb.getStackDepth() gives the depth and individual elements can be accessed as BATs using mdb.getStackFrame(n). The top stack frame is accessed using mdb.getStackFrame().

Switches

Switches control the level of detail output shown while debugging or tracing program execution. They are toggled with the set and unset command. The following switches are currently supported:

timer
activates a listing of all instructions being executed. It is measured in wall-clock time.
flow
shows the total byte size of all BAT target results and input arguments. It is a good indicator on the amount of data being processed.
memory
keeps track on growing memory needs.
io
keeps track on the amount of physical IO and is used to detect operators consuming excessive amounts of space.
bigfoot
keeps track of the current and maximum virtual memory footprint of the BATs.[incomplete]

The snippet below shows setting the memory and timer switch. The switches take effect at the next instruction.

     mdb>set timer
     mdb>set flow
     mdb>c
     [ 3 ]
     #    26 usec#   0  0#    io.print(i=3)
     #     6 usec#   0  0#    i := calc.*(i=6, _3=2)
     #    10 usec#   0  0#    b := bat.new(_5=0, _6=0)
     #     7 usec#   0  8#    bat.insert(b=<tmp_167>bat[:int,:int]{1}, _8=1, i=6)
     #-----------------#
     # h     t         # name
     # int   int       # type
     #-----------------#
     [ 1,      6       ]
     #    41 usec#   0  8#    io.print(b=<tmp_167>bat[:int,:int]{1})
     #     7 usec#   0  0#    return test := "ok";
     #   211 usec#   0  0#    user.test(_2=3)
     

Exception handling

Exception handling

Calling a built-in or user-defined routine may lead to an error or a cached status message to be dealt with in MAL. To improve error handling in MAL, an exception handling scheme based on catch-exit blocks. The catch statement identifies a (string-valued) variable, which carries the exception message from the originally failed routine or raise exception assignment. During normal processing catch-exit blocks are simply skipped. Upon receiving an exception status from a function call, we set the exception variable and skip to the first associated catch-exit block. MAL interpretation then continues until it reaches the end of the block. If no exception variable was defined, we should abandon the function alltogether searching for a catch block at a higher layer.

Exceptions raised within a linked-in function requires some care. First, the called procedure does not know anything about the MAL interpreter context. Thus, we need to return all relevant information upon leaving the linked library routine.

Second, exceptional cases can be handled deeply in the recursion, where they may also be handled, i.e. by issueing an GDKerror message. The upper layers merely receive a negative integer value to indicate occurrence of an error somewhere in the calling sequence. We then have to also look into GDKerrbuf to see if there was an error raised deeply inside the system.

The policy is to require all C-functions to return a string-pointer. Upon a successfull call, it is a NULL string. Otherwise it contains an encoding of the exceptional state encountered. This message starts with the exception identifer, followed by contextual details.

Garbage collection is relatively straightforward, because most values are retained on the stackframe of an interpreter call. However, two storage types and possibly user-defined type garbage collector definitions require attention: BATs and strings.

A key issue is to deal with temporary BATs in an efficient way. References to bats in the buffer pool may cause dangling references at the language level. This appears as soons as your share a reference and delete the BAT from one angle. If not carefull, the dangling pointer may subsequently be associated with another BAT

All string values are private to the VALrecord, which means they have to be freed explicitly before a MAL function returns. The first step is to always safe the destination variable before a function call is made.

All operations are responsible to properly set the reference count of the BATs being produced or destroyed. The libraries should not leave the physical reference count being set. This is only allowed during the execution of a GDK operation. All references should be logical.

MAL profiler

A key issue in the road towards a high performance implementation is to understand where resources are being spent. This information can be obtained using different tools and at different levels of abstraction. A coarse grain insight for a particular application can be obtained using injection of the necessary performance capturing statements in the instruction sequence. Fine-grain, platform specific information can be obtained using existing profilers, like valgrind (http://www.valgrind.org), or hardware performance counters.

The MAL profiler collects detailed performance information, such as cpu, memory and statement information. It is optionally extended with IO activity, which is needed for coarse grain profiling only, and estimated bytes read/written by an instruction.

The execution profiler is supported by hooks in the MAL interpreter. The default strategy is to ship an event record immediately over a stream to a separate performance monitor, formatted as a tuple. An alternative strategy is preparation for off-line performance analysis.

Reflective performance analysis is supported by an event cache, the event log becomes available as a series of BATs.

Event filters

The profiler supports selective retrieval of performance information by tagging the instructions of interest. This means that a profiler call has a global effect, all concurrent users are affected by the performance overhead. Therefore, it is of primary interest to single user sessions.

The example below illustrates how the different performance counter groups are activated, instructions are filtered for tracking, and where the profile information is retained for a posteriori analysis.

     #profiler.activate("event");
     #profiler.activate("pc");
     profiler.activate("time,ticks");
     profiler.activate("stmt");
     #profiler.activate("type");
     #profiler.activate("cpu");
     #profiler.activate("memory");
     #profiler.activate("reads");
     #profiler.activate("writes");
     #profiler.activate("obytes");
     #profiler.activate("wbytes");
     #profiler.activate("user");
     profiler.setFilter("*","insert");
     profiler.setFilter("*","print");
     
     profiler.openStream("/tmp/MonetDBevents");
     profiler.start();
     b:= bbp.new(:int,:int);
     bat.insert(b,1,15);
     bat.insert(b,2,4);
     bat.insert(b,3,9);
     io.print(b);
     profiler.stop();
     profiler.closeStream();

In this example, we are interested in all functions name insert and print. A wildcard can be used to signify any name, e.g. no constraints are put on the module in which the operations are defined. Several profiler components are ignored, shown by commenting out the code line.

Execution of the sample leads to the creation of a file with the following content. The ticks are measured in micro-seconds.

# time, ticks,  stmt  # name
[ "15:17:56",   12,   "_27 := bat.insert(<tmp_15>{3},1,15);" ]
[ "15:17:56",   2,    "_30 := bat.insert(<tmp_15>{3},2,4);"  ]
[ "15:17:56",   2,    "_33 := bat.insert(<tmp_15>{3},3,9);"  ]
[ "15:17:56",   245,  "_36 := io.print(<tmp_15>{3});",   ]

Cached events

Aside from shipping events to a separate process, the profiler can keep the events in a local bat group. It is the default when no target file has been opened to collect the information.

Ofcourse, every measurement scheme does not come for free and may even obscure performance measurements obtained through e.g. valgrind. The separate event caches can be accessed using the operator profiler.getTrace(name). The current implementation only supports access to time,ticks,pc,stmt. The event cache can be cleared with profiler.clearTrace().

Consider the following MAL program snippet:

     profiler.setAll();
     profiler.start();
     b:= bbp.new(:int,:int);
     bat.insert(b,1,15);
     io.print(b);
     profiler.stop();
     s:= profiler.getTrace("stmt");
     t:= profiler.getTrace("ticks");
     io.print(s,t);

The performance result of the program execution becomes:

#---------------------------------------------------------#
# h     t                                       t         # name
# int   str                                     int       # type
#---------------------------------------------------------#
[ 1,      "b := bbp.new(0,0);",                   51      ]
[ 2,      "$6 := bat.insert(<tmp_22>,1,15);",     16      ]
[ 3,      "$9 := io.print(<tmp_22>);",            189     ]

7.3 Monitoring Variables

The easiest scheme to obtain performance data is to retrieve the performance properties of an instruction directly after it has been executed using getEvent(). It reads the profiling stack maintained, provided you have started monitoring.

     profiler setFilter(b);
     profiler.start();
     ....
     b:= algebra.select(a,0 1000); # some expensive operation
     (clk, memread, memwrite):= profiler.getEvent();
     ...
     profiler.stop();

7.4 SQL table wrapper

The SQL frontend can access the profiler.

create function tracelog() 
	returns table (
		event integer,		-- event counter
		clk varchar(20), 	-- wallclock, no mtime in kernel
		pc varchar(50), 	-- module.function[nr]
		thread int, 		-- thread identifier
		"user" int, 			-- client identifier
		ticks integer, 		-- time in microseconds
		reads integer, 		-- number of blocks read
		writes integer, 	-- number of blocks written
		rbytes integer,		-- amount of bytes touched
		wbytes integer,		-- amount of bytes written
		type string,		-- return types
		stmt string			-- actual statement executed
	)
	external name sql.dump_trace;

7.5 Security

Profiling the system is a security leak. Cached plans are currently marked for profiling and not yet made private for the user session. Furthermore, all events are assembled in a global performance trace table. This means that concurrent users can tap what is going on in the system.

Dealing with it is major effort affecting both the kernel and the upper layers. For the time being, access to the data is considered priority over the leak.

Stethoscope

The program stethoscope is a simple application that can attach itself to a running server and extracts the profiler events from concurrent running queries. The default stethoscope action is to attach itself to all databases accessible under the same name through monetdb. Listening into the database at a particular server requires the host name argument.

For example, to tracks the microsecond ticks of two specific MAL instructions :

     stethoscope -u monetdb -P monetdb -d sf10 +t bat.insert algebra.join

 A convenient way to watch most of the SQL interaction you may use the command: stethoscope +ts algebra.* bat.* group.* sql.* aggr.*

A synopsis of the calling conventions (default branch Jan2012):

stethoscope [options] +[trace options] {<mod>.<fcn>}
  -d | --dbname=<database_name>
  -u | --user=<user>
  -P | --password=<password>
  -p | --port=<portnr>
  -h | --host=<hostname>

The trace options (default 'ISTest'):
  S = monitor start of instruction profiling
  a = aggregate clock ticks per instruction
  e = event counter
  f = module.function name
  i = instruction counter
  I = interpreter thread number
  T = wall clock time
  t = ticks in microseconds
  c = cpu statistics (utime,ctime,stime,cstime)
  m = memory resources as provided by OS
  r = block reads
  w = block writes
  b = bytes read/written
  s = MAL statement
  y = MAL argument types
  p = process statistics, e.g. page faults, context switches
  u = user id
  D = Generate dot file upon query start
  F = Dataflow memory claims

The dot file (D) option is used in conjunction with a 2D graphical tool to follow the execution and for post-mortem analysis. (not yet released)

Before the stethoscope  starts parsing command line options, it reads a .monetdb file.  If the environment variable DOTMONETDBFILE is set, it reads the file pointed to by that variable instead.  When unset, stethoscope searches for a .monetdb file in the current working directory, and if that doesn't exist, in the current user's home directory.  This file can contain defaults for the flags user, password, language, save_history, format and width .
For example, an entry in a .monetdb file that sets the default user name for stethoscope looks like this:
     user=monetdb
To disable reading the .monetdb file, set the variable DOTMONETDBFILE to the empty string in the environment.

 

 

MAL optimizers

One of the prime reasons to design the MAL intermediate language is to have a high-level description for database queries, which is easy to generate by a front-end compiler and easy to decode, optimize and interpret. In this section, we introduce the collection of MAL optimizers included in the code base. The tool kit is incrementally built, triggered by experimentation and curiousity. Several optimizers require further development to cope with the many features making up the MonetDB system. Such limitations on the implementation are indicated where appropriate.

Experience shows that construction and debugging of a front-end specific optimizer is simplified when you retain information on the origin of the MAL code produced as long as possible. For example, the snippet sql.insert(col, 12@0, "hello") can be the target of simple SQL rewrites using the module name as the discriminator.

Pipeline validation. The pipelines used to optimize MAL programs contain dependencies. For example, it does not make much sense to call the deadcode optimizer too early in the pipeline, although it is not an error. Moreover, some optimizers are merely examples of the direction to take, others are critical for proper functioning for e.g. SQL.

An optimizer needs several mechanisms to be effective. It should be able to perform a symbolic evaluation of a code fragment and collect the result in properties for further decision making. The prototypical case is where an optimizer estimates the result size of a selection. Another major issue is to be able to generate and explore a space of alternative evaluation plans. This exploration may take place up front, but can also be ran at runtime for query fragments

Landscape

A query optimizer is often a large and complex piece of code, which enumerates alternative evaluation plans from which 'the best' plan is selected for evaluation. Limited progress has been made sofar to decompose the optimizer into (orthogonal) components, because it is a common believe in research that a holistic view on the problem is a prerequisite to find the best plan. Conversely, commercial optimizers use a cost-model driven approach, which explores part of the space using a limited (up to 300) rewriting rules.

Our hypothesis is that query optimization should be realized with a collection of query optimizer transformers (QOT), each dedicated to a specific task. Furthermore, they are assembled in scenarios to support specific application domains or achieve a desired behavior. Such scenarios are selected on a session basis, a query basis, or dynamically at runtime; they are part of the query plan.

The query transformer list below is under consideration for development. For each we consider its goal, approach, and expected impact. Moreover, the minimal prerequisites identify the essential optimizers that should have done their work already. For example, it doesn't make sense to perform a static evaluation unless you have already propagated the constants using Alias Removal.

Constant expressions Goal: to remove scalar expressions which need be evaluated once during the query lifetime. Rationale: static expressions appear when variables used denote literal constants (e.g. 1+1), when catalog information can be merged with the plan (e.g. max(B.salary)), when session variables are used which are initialized once (e.g. user()). Early evaluation aids subsequent optimization. Approach: inspect all instructions to locate static expressions. Whether they should be removed depends on the expected re-use, which in most cases call for an explicit request upon query registration to do so. The result of a static evaluation provides a ground for alias removal. Impact: relevant for stored queries (MAL functions) Prereq: alias removal, common terms

Relational Expression Optimizer Goal: to evaluate a relational plan using properties of BATs, such as being empty or forming an aligned group. These optimizations assume that the code generator can detect properties while compiling e.g. an SQL query. Impact: high Prereq:

Alias Removal Goal: to reduce the number of variables referenceing the same value, thereby reducing the analysis complexity. Rationale: query transformations often result in replacing the right-hand side expression with a result variable. This pollutes the code block with simple assignments e.g. V:=T. Within the descendant flow the occurrence of V could be replaced by T, provided V is never assigned a new value. Approach: literal constants within a MAL block are already recognized and replaced by a single variable. Impact: medium.

Common Term Optimizer Goal: to reduce the amount of work by avoiding calculation of the same operation twice. Rationale: to simplify code generation for front-ends, they do not have to remember the subexpressions already evaluated. It is much easier to detect at the MAL level. Approach: simply walk through the instruction sequence and locate identical patterns. (Enhance is with semantic equivalent instructions) Impact: High Prereq: Alias Removal

Dead Code Removal Goal: to remove all instructions whose result is not used Rationale: due to sloppy coding or alternative execution paths dead code may appear. Als XML Pathfinder is expected to produce a large number of simple assignments. Approach: Every instruction should produce a value used somewhere else. Impact: low

Heuristic Rule Rewrites Goal: to reduce the volume as quick as possible. Rationale: most queries are focussed on a small part of the database. To avoid carrying too many intermediates, the selection should be performed as early as possible in the process. This assumes that selectivity factors are known upfront, which in turn depends on histogram of the value distribution. Approach: locate selections and push them back/forth through the flow graph. Impact: high

Join Path Optimizer Goal: to reduce the volume produced by a join sequence Rationale: join paths are potentially expensive operations. Ideally the join path is evaluated starting at the smallest component, so as to reduce the size of the intermediate results. Approach: to successfully reduce the volume we need to estimate their processing cost. This calls for statistics over the value distribution, in particular, correlation histograms. If statistics are not available upfront, we have to restore to an incremental algorithm, which decides on the steps using the size of the relations. Impact: high

Operator Sort Goal: to sort the dataflow graph in such a way as to reduce the cost, or to assure locality of access for operands. Rationale: A simple optimizer is to order the instructions for execution by permutation of the query components Approach: Impact:

Singleton Set Goal: to replace sets that are known to produce precisely one tuple. Rationale: Singleton sets can be represented by value pairs in the MAL program, which reduces to a scalar expression. Approach: Identify a set variable for replacement. Impact:

Range Propagation Goal: look for constant ranges in select statements and propagate them through the code. Rationale: partitioned tables and views may give rise to expressions that contain multiple selections over the same BAT. If their arguments are constant, the result of such selects can sometimes be predicted, or the multiple selections can be cascaded into a single operation. Impact: high, should be followed by alias removal and dead code removal

Result Cacher Goal: to reduce the processing cost by keeping track of expensive to compute intermediate results Rationale: Approach: result caching becomes active after an instruction has been evaluated. The result can be cached as long as its underlying operands remain unchanged. Result caching can be made transparent to the user, but affects the other quer optimizers. Impact: high

Vector Execution Goal: to rewrite a query to use a cache-optimal vector implementation Rationale: processing in the cache is by far the best you can get. However, the operands may far exceed the cache size and should be broken into pieces followed by a staged execution of the fragments involved. Approach: replace the query plan with fragment streamers Impact:

Staged Execution Goal: to split a query plan into a number of steps, such that the first response set is delivered as quickly as possible. The remainder is only produced upon request. Rationale: interactive queries call for quick response and an indication of the processing time involved to run it too completion. Approach: staged execution can be realized using a fragmentation scheme over the database, e.g. each table is replaced by a union of fragments. This fragmentation could be determined upfront by the user or is derived from the query and database statistics. impact: high

Code Parallizer Goal: to exploit parallel IO and cpu processing in both SMP and MPP settings. Rationale: throwing more resources to solve a complex query helps, provided it is easy to determine that parallel processing recovers the administrative overhead Approach: every flow path segment can be handled by an independent process thread. Impact: high

Query Evaluation Maps Goal: to avoid touching any tuple that is not relevant for answering a query. Rationale: the majority of work in solving a query is to disgard tuples of no interest and to find correlated tuples through join conditions. Ideally, the database learns these properties over time and re-organizes (or builts a map) to replace disgarding by map lookup. Approach: piggyback selection and joins as database fragmentation instructions Impact: high

MAL Compiler (tactics) Goal: to avoid interpretation of functional expressions Rationale: interpretation of arithmetic expressions with an interpreter is always expensive. Replacing a complex arithmetic expressin with a simple dynamically compiled C-functions often pays off. Especially for cached (MAL) queries Approach: Impact: high

Dynamic Query Scheduler (tactics) Goal: to organize the work in a way so as to optimize resource usage Rationale: straight interpretation of a query plan may not lead to the best use of the underlying resources. For example, the content of the runtime cache may provide an opportunity to safe time by accessing a cached source Approach: query scheduling is the last step before a relation algebra interpreter takes over control. The scheduling step involves a re-ordering of the instructions within the boundaries imposed by the flow graph. impact: medium

Aggregate Groups Goal: to reduce the cost of computing aggregate expressions over times Rationale: many of our applications call for calculation of aggregates over dynamically defined groupings. They call for lengtly scans and it pays to piggyback all aggregate calculates, leaving their result in the cache for later consumption (eg the optimizers) Approach: Impact: High

Data Cube optimizer Goal: to recognize data cube operations Rationale: Approach: Impact:

Demand Driven Interpreter (tactics) Goal: to use the best interpreter and libraries geared at the task at hand Rationale: Interpretation of a query plan can be based on different computational models. A demand driven interpretation starts at the intended output and 'walks' backward through the flow graph to collect the pieces, possibly in a pipelined fashion. (Vulcano model) Approach: merely calls for a different implementation of the core operators Impact: high

Iterator Strength Reduction Goal: to reduce the cost of iterator execution by moving instructions out of the loop. Rationale: although iteration at the MAL level should be avoided due to the inherent low performance compared to built-in operators, it is not forbidden. In that case we should confine the iterator block to the minimal work needed. Approach: inspect the flowgraph for each iterator and move instructions around. Impact: low

Accumulator Evaluation Goal: to replace operators with cheaper ones. Rationale: based on the actual state of the computation and the richness of the supporting libraries there may exists alternative routes to solve a query. Approach: Operator rewriting depends on properties. No general technique. The first implementation looks at calculator expressions such as they appear frequently in the RAM compiler. Impact: high Prerequisite: should be called after common term optimizer to avoid clashes. Status: Used in the SQL optimizer.

Code Inliner Goal: to reduce the calling depth of the interpreter and to obtain a better starting point for code squeezing Rationale: substitution of code blocks (or macro expansion) leads to longer linear code sequences. This provides opportunities for squeezing. Moreover, at runtime building and managing a stackframe is rather expensive. This should be avoided for functions called repeatedly. Impact: medium Status: Used in the SQL optimizer to handle SQL functions.

Code Outliner Goal: to reduce the program size by replacing a group with a single instruction Rationale: inverse macro expansion leads to shorter linear code sequences. This provides opportunities for less interpreter overhead, and to optimize complex, but repetative instruction sequences with a single hardwired call Approach: called explicitly to outline a module (or symbol) Impact: medium

Garbage Collector Goal: to release resources as quickly as possible Rationale: BATs referenced from a MAL program keep resources locked. Approach: In cooperation with a resource scheduler we should identify those that can be released quickly. It requires a forced gargabe collection call at the end of the BAT's lifespan. Impact: large Status: Implemented. Algorithm based on end-of-life-span analysis.

Foreign Key replacements Goal: to improve multi-attribute joins over foreign key constraints Rationale: the code produced by the SQL frontend involves foreign key constraints, which provides many opportunities for speedy code using a join index. Impact: large Status: Implemented in the SQL strategic optimizer.

Dependencies

5.1.1 Optimizer Dependencies

The optimizers are highly targeted to a particular problem. Aside from the resources available to invest in plan optimization, optimizers are partly dependent and may interfere.

To aid selection of the components of interest, we have grouped them in a preferred order of deployment.

Group A: Code Inliner.
  Constant Expression Evaluator.
  Relational Expression Evaluator.
  Strength Reduction.

 

Group B: Common Term Optimizer.
  Query Evaluation Maps.

 

Group C: Join Path Optimizer.
  Ranges Propagation.
  Operator Cost Reduction.
  Operator Sort.
  Foreign Key handling.
  Aggregate Groups.
  Data Cube optimizer.
  Heuristic Rule Rewrite.

 

group D: Code Parallizer.
  Accumulator Evaluations.
  Result Cacher.
  Replication Manager.

 

group E: MAL Compiler.
  Dynamic Query Scheduler.
  Vector Execution.
  Staged Execution.

 

group F: Alias Removal.
  Dead Code Removal.
  Garbage Collector.

The interaction also excludes combinations. For example, the Accumulator should be used after the Partition optimizer.

Accumulators

Bulk arithmetic calculations are pretty expensive, because new bats are created for each expression. This memory hunger can be reduced by detecting opportunities for accummulator processing, i.e. where a (temporary) variable is overwritten. For example, consider the program snippet

     	t3:= batcalc.*(64,t2);
     	t4:= batcalc,+(t1,t3);
     	optimizer.accumulators();

If variable t2 is a temporary variable and not used any further in the program block, we can re-use its storage space.

     	t3:= batcalc.*(t2,64,t2);
     	t4:= batcalc.+(t1,t3);

The implementation is straight forward. It only deals with the arithmetic operations available in batcalc right now. This set may be gradually extended. The key decision is to determine whether we may overwrite any of the arguments. This is hard to detect at compile time, e.g. the argument may be the result of a binding operation or represent a view over a persistent BAT.

[NOTE the accumulator optimizer is known to produce problems due to concurrent access to BATs. It has been disabled]

Alias removal

The routine optimizer.aliasRemoval() walks through the program looking for simple assignment statements, e.g. V:=W. It replaces all subsequent occurrences of V by W, provided V is assigned a value once and W does not change in the remainder of the code. Special care should be taken for iterator blocks as illustrated in the case below:

	i:=0;
	b:= "done";
barrier go:= true;
	c:=i+1;
	d:="step";
	v:=d;
	io.print(v);
	i:=c;
redo go:= i<2;
exit go;
	io.print(b);
	optimizer.aliasRemoval();

The constant strings are propagated to the

print()

routine, while the initial assigment

i:=0

should be retained. The code block becomes:

	i:=0;
barrier go:= true;
	c:=i+1;
	io.print("step");
	i:=c;
redo go:= i<2;
exit go;
	io.print("done");

A special case is backward propagation of constants. The following snippet is the result of the JITO emptyset. It can be further reduced to avoid useless assignments.

    _53 := sql.bind("sys","_tables","type",0);
    (_54,_56,_58,_60) := bat.partition(_53);
    _53 := nil;
    _67 := _54;
    _54 := nil;
    _75 := _67;
    _67 := nil;
    _83 := _75;
    _75 := nil;

Code factorization

In most real-life situations queries are repeatedly called with only slight changes in their parameters. This situation can be captured by the query compilers by keeping a cache of recent query plans. In MonetDB context such queries are represented as parameterized MAL programs.

To further optimize the cached functions it might help to split the query plan into two sections. One section with those actions that do not depend on the arguments given and another section that contains the heart of the query using all information. Such a program can be represented by a MAL factory, which is a re-entrend query plan.

An example of how factorize changes the code is shown below:

function test(s:str):lng;
    b:= bat.new(:int,:str);
    bat.insert(b,1,"hello");
    z:= algebra.select(b,s,s);
    i:= aggr.count(z);
    return i;
end test;
optimizer.factorize("user","test");

which translates into the following block:

factory user.test(s:str):lng;
    b := bat.new(:int,:str);
    bat.insert(b,1,"hello");
barrier always := true;
    z := algebra.select(b,s,s);
    i := aggr.count(z);
    yield i;
    redo always;
exit always;
end test;

The factorizer included is a prototype implementation of MAL factorization. The approach taken is to split the program into two pieces and wrap it as a MAL factory. The optimization assumes that the database is not changed on tables accessed only once during the factory lifetime. Such changes should be detected from the outside and followed by re-starting the factory.

A refined scheme where the user can identify the 'frozen' parameters is left for the future. As the mapping of a query to any of the possible available factories to deal with the request. For the time being we simple reorganize the plan for all parameters

The factorize operation interferes with optimizer.expressionAccumulation() because that may overwrite the arguments. For the time being, this is captured in a local routine.

Coercions

A simple optimizer that removes coercions that are not needed. They may result from a sloppy code-generator or function call resolution decision. For example:

       v:= calc.int(23);

becomes a single assignment without function call.

The primary role is a small illustration of coding an optimizer algorithm.

Common subexpressions

5.2.6 Common Subexpression Elimination

Common subexpression elimination merely involves a scan through the program block to detect re-curring statements. The key problem to be addressed is to make sure that the parameters involved in the repeatative instruction are invariant.

The analysis of optimizer.commonTerms() is rather crude. All functions with possible side-effects on their arguments should have been marked as 'unsafe'. Their use within a MAL block breaks the dataflow graph for all objects involved (BATs, everything kept in boxes).

The common subexpression optimizer locates backwards the identical instructions. It stops as soon as it has found an identical one. Before we can replace the expression with the variable(s) of the previous one, we should assure that we haven;t passed a non-empty barrier block.

    b:= bat.new(:int,:int);
    c:= bat.new(:int,:int);
    d:= algebra.select(b,0,100);
    e:= algebra.select(b,0,100);
    k1:= 24;
    k2:= 27;
    l:= k1+k2;
    l2:= k1+k2;
    l3:= l2+k1;
    optimizer.commonTerms();

is translated into the code block where the first two instructions are not common, because they have side effects.

    b := bat.new(:int,:int);
    c := bat.new(:int,:int);
    d := algebra.select(b,0,100);
    e := d;
    l := calc.+(24,27);
    l3 := calc.+(l,24);

Constant expressions

5.2.7 Constant Expression Evaluation

Expressions produced by compilers involving only constant arguments can be evaluated once. It is particular relevant in functions that are repeatably called. One time queries would not benefit from this extra step.

Consider the following snippet, which contains recursive use of constant arguments

    a:= 1+1;        io.print(a);
    b:= 2;           io.print(b);
    c:= 3*b;        io.print(c);
    d:= calc.flt(c);io.print(d);
    e:= mmath.sin(d);io.print(e);
    optimizer.aliasRemoval();
    optimizer.evaluate();

The code produced by the optimizer would be

    io.print(2);
    io.print(2);
    io.print(6);
    io.print(6);
    io.print(-0.279415488);

Likewise we attempt to catch barrier blocks based on constants.

Cost model

5.2.8 Costmodel Approach

Cost models form the basis for many optimization decisions. The cost parameters are typically the size of the (intermediate) results and response time. Alternatively, they are running aggregates, e.g. max memory and total execution time, obtained from a simulated run. The current implementation contains a framework and an example for building your own cost-based optimized.

The optimizer.costModel() works its way through a MAL program in search for relational operators and estimates their result size. The estimated size is left behind as the property rows.

    r{rows=100} := bat.new(:oid,:int);
    s{rows=1000}:= bat.new(:oid,:int);
    rs:= algebra.select(s,1,1);
    rr:= bat.reverse(r);
    j:= algebra.join(rs,rr);
    optimizer.costModel();

changes the properties of the instructions as follows:

    r{rows=100} := bat.new(:oid,:int);
    s{rows=1000} := bat.new(:oid,:int);
    rs{rows=501} := algebra.select(s,1,1);
    rr{rows=100} := bat.reverse(r);
    j{rows=100} := algebra.join(rs,rr);

The cost estimation does not use any statistics on the actual data distribution yet. It relies on the rows property provided by the front-end or other optimizers. It just applies a few heuristic cost estimators. However, it ensures that empty results are only tagged with rows=0 if the estimate is accurate, otherwise it assumes at least one result row. This property makes it possible to safely pass the result of the cost estimation to the emptySet optimizer for code reduction.

Data flow

5.2.9 The dataflow optimizer

MAL programs are largely logical descriptions of an execution plan. At least as it concerns side-effect free operations. For these sub-plans the order of execution needs not to be a priori fixed and a dataflow driven evaluation is possible. Even using multiple cores to work their way through the dataflow graph.

The dataflow optimizer analyses the code and wraps all instructions eligible for dataflow driven execution with a guarded block. Ofcourse, this is only necessary if you can upfront determine there are multiple threads of execution possible.

Upon execution, the interpreter instantiates multiple threads based on an the number of processor cores available. Subsequently, the eligible instructions are queued and consumed by the interpreter threads.

Dataflow blocks may not be nested. Therefore, any dataflow block produced for inlined code is removed first.

Initial experiments on e.g. the RDF benchmark showed a speed up of 20\% on average, with a peak of a factor 2 for individual queries. The main reason for this limited gain stems from the little opportunities for parallel execution in the SQL code plans

Dead code

5.2.10 Dead Code Removal

Dead code fragments are recognized by assignments to variables whose value is not consumed any more. It can be detected by marking all variables used as arguments as being relevant. In parallel, we built a list of instructions that should appear in the final result. The new code block is than built in one scan, discarding the superflous instructions.

Instructions that produce side effects to the environment, e.g., printing and BAT updates, should be taken into account. Such (possibly recursive) functions should be marked with a property (unsafe). For now we recognize a few important ones Likewise, instructions marked as control flow instructions should be retained.

An illustrative example is the following MAL snippet:

	V7 := bat.new(:oid,:int);
	V10 := bat.new(:int,:oid);
	V16 := algebra.markH(V7);
	V17 := algebra.join(V16,V7);
	V19 := bat.new(:oid,:int);
	V22 := bat.new(:oid,:int);
	V23 := algebra.join(V16,V22);
	io.print("done");
	optimizer.deadCodeRemoval();

The dead code removal trims this program to the following short block:

	io.print("done");

A refinement of the dead code comes from using arguments that ceased to exist due to actions taken by an optimizer. For example, in the snippet below the pushranges optimizer may conclude that variable V31 becomes empty and simply injects a 'dead' variable by dropping the assignment statement. This makes other code dead as well.

     	V30 := algebra.select( V7, 10,100);
     	V31 := algebra.select(V30,-1,5);
     	V32 := aggr.sum(V31);
     	io.print(V32);

[implementation pending]

Empty sets

5.2.11 Emptyset Reduction

One of the key decisions during MAL optimization is to estimate the size of the BATs produced and consumed. Two cases are of interest for symbolic processing. Namely, when a BAT is known to contain no tuples and those that have precisely one element. Such information may come from application domain knowledge or as a side effect from symbolic evaluation. It is associated with the program under inspection as properties.

The empty set property is used by the reduction algorithm presented here. Any empty set is propagated through the program to arrive at a smaller and therefore faster evaluation.

For example, consider the following MAL test:

         V1 := bat.new(:oid,:int);
         V7 := bat.new(:oid,:int);
         V10{rows=0} := bat.new(:int,:oid);
         V11 := bat.reverse(V10);
         V12 := algebra.kdifference(V7,V11);
         V16 := algebra.markH(V12);
         V17 := algebra.join(V16,V7);
         bat.append(V1,V17);
     	optimizer.costModel();
     	optimizer.emptySet();

Calling the optimizers replaces this program by the following code snippet.

         V1 := bat.new(:oid,:int);
         V7 := bat.new(:oid,:int);
         V10{rows=0} := bat.new(:int,:oid);
         V11{rows=0} := bat.new(:oid,:int);
         V12 := V7;
         V16 := algebra.markH(V12);
         V17 := algebra.join(V16,V7);
         bat.append(V1,V17);
     

This block can be further optimized using alias propagation and dead code removal. The final block becomes:

         V1 := bat.new(:oid,:int);
         V7 := bat.new(:oid,:int);
         V16 := algebra.markH(V7);
         V17 := algebra.join(V16,V7);
         bat.append(V1,V17);

During empty set propagation, new candidates may appear. For example, taking the intersection with an empty set creates a target variable that is empty too. It becomes an immediate target for optimization. The current implementation is conservative. A limited set of instructions is considered. Any addition to the MonetDB instruction set would call for assessment on their effect.

5.2.12 SQL specifics

The bind operations of SQL requires special care, because they refer to containers that might initially be empty, but aren't upon a second call of the query plan. This calls for a defensive approach, where a constraint check is left behind to detect a plan whose conditions are not met anymore. Of course, we can drop the constraint if we know that a plan is used onlye once (and not recursively). This can be marked by the SQL compiler, who is in control over the query cache.

Garbage collector

5.2.13 Garbage Collection

Garbage collection of temporary variables, such as strings and BATs, takes place upon returning from a function call. Especially for BATs this may keep sizable resources locked longer than strictly necessary. Although the programmer can influence their lifespan by assignment of the nil, thereby triggering the garbage collector, it is more appropriate to rely on an optimizer to inject these statements. For, it keeps the program smaller and a better target for code-optimizations.

The operation optimizer.garbageCollector() removes all BAT references that are at their end of life to make room for new ones. It is typically called as one of the last optimizer steps. A snippet of a the effect of the garbage collector:

    t1 := bat.new(:oid,:int);
    t2 := array.grid(132000,8,1,0);
    t3 := array.grid(1,100,10560,0);
    t4 := array.grid(1,100,10560,0,8);
    t5 := batcalc.+(t2,t4);
    t6 := batcalc.oid(t5);
    t7 := algebra.join(t6,t1);
    optimizer.garbageCollector();

is translated into the following code block:

    t1 := bat.new(:oid,:int);
    t2 := array.grid(132000,8,1,0);
    t3 := array.grid(1,100,10560,0);
    t4 := array.grid(1,100,10560,0,8);
    t5 := batcalc.+(t2,t4);
    bat.setGarbage(t2);
    bat.setGarbage(t4);
    t6 := batcalc.oid(t5);
    bat.setGarbage(t5);
    t7 := algebra.join(t6,t1);
    bat.setGarbage(t6);
    bat.setGarbage(t1);

The current algorithm is straight forward. After each instruction, we check whether its BAT arguments are needed in the future. If not, we inject a garbage collection statement to release them, provided there are no other reasons to retain it. This should be done carefully, because the instruction may be part of a loop. If the variable is defined inside the loop, we can safely remove it.

Heuristic rules

5.2.14 Heuristic rewrites rules

One of the oldest optimizer tricks in relational query processing is to apply heuristic rules to reduce the processing cost. For example, a selection predicate is pushed through another operator to reduce the number of tuples to consider. Heuristic rewrites are relatively easy to apply in a context where the expression is still close to a relational algebra tree. Therefore, many of the standard rewrite rules are already applied by the SQL front-end as part of its strategic optimization decisions.

Finding rewrite opportunities within a linear MAL program may be more difficult. For example, the pattern should respect the flow of control that may already be introduced. The last resort for the optimizer builder is to write a C-function to look for a pattern of interest and transform it. The code base contains an example how to built such user specific optimizer routines. It translates the pattern:

     y:= reverse(R);
     z:= select(y,l,h);

into the statement:

     z:= selectHead(x,R,l,h

Join paths

5.2.15 Join Paths

The routine optimizer.joinPath() walks through the program looking for join operations and cascades them into multiple join paths. To illustrate, consider

	a:= bat.new(:oid,:oid);
	b:= bat.new(:oid,:oid);
	c:= bat.new(:oid,:str);
	j1:= algebra.join(a,b);
	j2:= algebra.join(j1,c);
	j3:= algebra.join(b,b);
	j4:= algebra.join(b,j3);

The optimizer will first replace all arguments by their join sequence. The underlying instructions are left behind for the deadcode optimizer to be removed.

	a:= bat.new(:oid,:oid);
	j1:= algebra.join(a,b);
	j2:= algebra.joinPath(a,b,c);
	j3:= algebra.join(b,b);
	j4:= algebra.joinPath(b,b,b);

The same operation is applied to sequences of leftjoin operations.

Any join result used multiple times should be calculated once. A list of inlined paths is kept around to detect those.

In principle, the joinpaths may contain common subpaths, whose materialization would improve performance.

The SQL front-end produces often produces snippets of the following structure

    t1:= algebra.join(b,c);
    z1:= algebra.join(a,t1);
	...
    t2:= algebra.join(b,d);
    z2:= algebra.join(a,t2);

The joinpath would merge them into

    z1:= algebra.joinPath(a,b,c);
	...
    z2:= algebra.joinPath(a,b,d);

which are handle by a heuristic looking at the first two argments and re-uses a materialized join.

    _13:= algebra.join(a,b);
    z1:= algebra.join(_13,c);
	...
    z2:= algebra.join(_13,d);

An alternative is to make recognition of any common re-useable paths an integral part of the joinPath body.

    x3:= algebra.join(a,b);
	r3:= bat.reverse(x3);
	j1:= join(c,r3);
	rb:= bat.reverse(b);
	ra:= bat.reverse(a);
	j1:= algebra.joinpath(c,rb,ra);

As a final step in the speed up of the joinpath we consider clustering large operands if that is expected to improve IO behavior.

Building blocks

5.1.2 Optimizer Building Blocks

Some instructions are independent of the execution context. In particular, expressions over side-effect free functions with constant parameters could be evaluated before the program block is considered further.

A major task for an optimizer is to select instruction (sequences) which can and should be replaced with cheaper ones. The cost model underlying this decision depends on the processing stage and the overall objective. For example, based on a symbolic analysis their may exist better implementations within the interpreter to perform the job (e.g. hashjoin vs mergejoin). Alternative, expensive intermediates may be cached for later use.

Plan enumeration is often implemented as a Memo structure, which designates alternative sub-plans based on a cost metric. Perhaps we can combine these memo structures into a large table for all possible combinations encountered for a user.

The MAL language does not imply a specific optimizer to be used. Its programs are merely a sequence of specifications, which is interpreted by an engine specific to a given task. Activation of the engine is controlled by a scenario, which currently includes two hooks for optimization; a strategic optimizer and a tactical optimizer. Both engines take a MAL program and produce a (new/modified) MAL program for execution by the lower layers.

MAL programs end-up in the symbol table linked to a user session. An optimizer has the freedom to change the code, provided it is known that the plan derived is invariant to changes in the environment. All others lead to alternative plans, which should be collected as a trail of MAL program blocks. These trails can be inspected for a posteriori analysis, at least in terms of some statistics on the properties of the MAL program structures automatically. Alternatively, the trail may be pruned and re-optimized when appropriate from changes in the environment.

The rule applied for all optimizers is to not-return before checking the state of the MAL program, and to assure the dataflow and variable scopes are properly set. It costs some performance, but the difficulties that arise from optimizer interference are very hard to debug. One of the easiest pitfalls is to derive an optimized version of a MAL function while it is already referenced by or when polymorphic typechecking is required afterwards.

5.1.3 Building Your Own Optimizer

Implementation of your own MAL-MAL optimizer can best be started from refinement of one of the examples included in the code base. Beware that only those used in the critical path of SQL execution are thorouhly tested. The others are developed up to the point that the concept and approach can be demonstrated.

The general structure of most optimizers is to actively copy a MAL block into a new program structure. At each step we determine the action taken, e.g. replace the instruction or inject instructions to achieve the desired goal.

A tally on major events should be retained, because it gives valuable insight in the effectiveness of your optimizer. The effects of all optimizers is collected in a system catalog.

Each optimizer ends with a strong defense line, optimizerCheck() It performs a complete type and data flow analysis before returning. Moreover, if you are in debug mode, it will keep a copy of the plan produced for inspection. Studying the differences between optimizer steps provide valuable information to improve your code.

The functionality of the optimizer should be clearly delineated. The guiding policy is that it is always safe to not apply an optimizer step. This helps to keep the optimizers as independent as possible.

It really helps if you start with a few tiny examples to test your optimizer. They should be added to the Tests directory and administered in Tests/All.

Breaking up the optimizer into different components and grouping them together in arbitrary sequences calls for careful programming.

One of the major hurdles is to test interference of the optimizer. The test set is a good starting point, but does not garantee that all cases have been covered.

In principle, any subset of optimizers should work flawlessly. With a few tens of optimizers this amounts to potential millions of runs. Adherence to a partial order reduces the problem, but still is likely to be too resource consumptive to test continously.

Data flow

In many optimization rules, the data flow dependency between statements is of crucial importance. The MAL language encodes a multi-source, multi-sink dataflow network. Optimizers typically extract part of the workflow and use the language properties to enumerate semantic equivalent solutions, which under a given cost model turns out to result in better performance.

The flow graph plays a crucial role in many optimization steps. It is unclear as yet what primitives and what storage structure is most adequate. For the time being we introduce the operations needed and evaluate them directly against the program

For each variable we should determine its scope of stability. End-points in the flow graph are illustrative as dead-code, that do not produce persistent data. It can be removed when you know there are no side-effect.

Side-effect free evaluation is a property that should be known upfront. For the time being, we assume it for all operations known to the system. The property "unsafe" is reserved to identify cases where this does not hold. Typically, a bun-insert operation is unsafe, as it changes one of the parameters.

Framework

5.1.4 Optimizer framework

The large number of query transformers calls for a flexible scheme for the deploy them. The approach taken is to make all optimizers visible at the language level as a signature optimizer.F() and optimizer.F(mod,fcn). The latter designates a target function to be inspected by the optimizer F(). Then (semantic) optimizer merely inspects a MAL block for their occurrences and activitates it.

The optimizer routines have access to the client context, the MAL block, and the program counter where the optimizer call was found. Each optimizer should remove itself from the MAL block.

The optimizer repeatedly runs through the program until no optimizer call is found.

Note, all optimizer instructions are executed only once. This means that the instruction can be removed from further consideration. However, in the case that a designated function is selected for optimization (e.g., commonTerms(user,qry)) the pc is assumed 0. The first instruction always denotes the signature and can not be removed.

To safeguard against incomplete optimizer implementations it is advisable to perform an optimizerCheck at the end. It takes as arguments the number of optimizer actions taken and the total cpu time spent. The body performs a full flow and type check and re-initializes the lifespan administration. In debugging mode also a copy of the new block is retained for inspection.

Lifespans

Optimizers may be interested in the characteristic of the barrier blocks for making a decision. The variables have a lifespan in the code blocks, denoted by properties beginLifespan,endLifespan. The beginLifespan denotes the intruction where it receives its first value, the endLifespan the last instruction in which it was used as operand or target.

If, however, the last use lies within a BARRIER block, we can not be sure about its end of life status, because a block redo may implictly revive it. For these situations we associate the endLifespan with the block exit.

In many cases, we have to determine if the lifespan interferes with a optimization decision being prepared. The lifespan is calculated once at the beginning of the optimizer sequence. It should either be maintained to reflect the most accurate situation while optimizing the code base. In particular, it means that any move/remove/addition of a MAL instruction calls for either a recalculation or further propagation. Unclear what will be the best strategy. For the time being we just recalc.

See is all arguments mentioned in the instruction at point pc are still visible at instruction qc and have not been updated in the mean time. Take into account that variables may be declared inside a block. This can be calculated using the BARRIER/CATCH and EXIT pairs.

The safety property should be relatively easy to determine for each MAL function. This calls for accessing the function MAL block and to inspect the arguments of the signature.

Any instruction may block identification of a common subexpression. It suffices to stumble upon an unsafe function whose parameter lists has a non-empty intersection with the targeted instruction. To illustrate, consider the sequence

     L1 := f(A,B,C);
     ...
     G1 := g(D,E,F);
     ...
     l2:= f(A,B,C);
     ...
     L2:= h()

The instruction G1:=g(D,E,F) is blocking if G1 is an alias for {A,B,C}. Alternatively, function g() may be unsafe and {D,E,F} has a non-empty intersection with {A,B,C}. An alias can only be used later on for readonly (and not be used for a function with side effects).

Macro processing

5.2.16 Macro and Orcam Processing

The optimizers form the basis for replacing code fragments. Two optimizers are focused on code expansion and contraction. The former involves replacing individual instructions by a block of MAL code, i.e. a macro call. The latter depicts the inverse operation, a group of instructions is replaced by a single MAL assignment statement, i.e. a orcam call.

The macro facility is limited to type-correct MAL functions, which means that replacement is not essential from a semantic point of view. They could have been called, or the block need not be compressed. It is equivalent to inline code expansion.

The macro and orcam transformations provide a basis to develop front-end specific code generation templates. The prototypical test case is the following template:

     function user.joinPath( a:bat[:any_1,:any_2],
                     b:bat[:any_2,:any_3],
                     c:bat[:any_3,:any_4]):bat[:any_1,:any_4]
     address fastjoinpath;
         z:= join(a,b);
         zz:= join(z,c);
         return zz;
     end user.joinPath;

The call optimizer.macro("user", "joinPath") hunts for occurrences of the instruction call in the block in which it is called and replaces it with the body, i.e. it in-lines the code. Conversely, the optimizer.orcam("user", "joinPath") attempts to localize a block of two join operations and, when found, it is replaced by the direct call to joinPath. In turn, type resolution then directs execution to a built-in function fastjoinpath.

The current implementation is limited to finding a consecutive sequence, ending in a return-statement. The latter is needed to properly embed the result in the enclosed environment. It may be extended in the future to consider the flow of control as well.

5.2.17 Known issues

The functions subject to expansion or contraction should be checked on 'proper' behavior.

The current implementation is extremely limited. The macro optimizer does not recognize use of intermediate results outside the block being contracted. This should be checked and it should block the replacement, unless the intermediates are part of the return list. Likewise, we assume here that the block has a single return statement, which is also the last one to be executed.

The macro optimizer can only expand functions. Factories already carry a significant complex flow of control that is hard to simulate in the nested flow structure of an arbitrary function.

The orcam optimizer can not deal with calls controlled by a barrier. It would often require a rewrite of several other statements as well.

     pattern optimizer.macro(targetmod:str,targetfcn:str):void
     address OPTmacro
     comment "Inline the code of the target function.";
     pattern optimizer.macro(mod:str,fcn:str,targetmod:str,targetfcn:str):void
     address OPTmacro
     comment "Inline a target function used in a specific function.";
     
     pattern optimizer.orcam(targetmod:str,targetfcn:str):void
     address OPTorcam
     comment "Inverse macro processor for current function";
     pattern optimizer.orcam(mod:str,fcn:str,targetmod:str,targetfcn:str):void
     address OPTorcam
     comment "Inverse macro, find pattern and replace with a function call.";
     

Memoization

5.3 Memo-based Query Execution

Modern cost-based query optimizers use a memo structure to organize the search space for an efficient query execution plan. For example, consider an oid join path 'A.B.C.D'. We can start the evaluation at any point in this path.

Its memo structure can be represented by a (large) MAL program. The memo levels are encapsulated with a choice operator. The arguments of the second dictate which instructions to consider for cost evaluation.

     	...
     	scheduler.choice("getVolume");
     	T1:= algebra.join(A,B);
     	T2:= algebra.join(B,C);
     	T3:= algebra.join(C,D);
     	scheduler.choice("getVolume",T1,T2,T3);
     	T4:= algebra.join(T1,C);
     	T5:= algebra.join(A,T2);
     	T6:= algebra.join(T2,D);
     	T7:= algebra.join(B,T3);
     	T8:= algebra.join(C,D);
     	scheduler.choice("getVolume",T4,T5,T6,T7,T8);
     	T9:= algebra.join(T4,D);
     	T10:= algebra.join(T5,D);
     	T11:= algebra.join(A,T6);
     	T12:= algebra.join(A,T7);
     	T13:= algebra.join(T1,T8);
     	scheduler.choice("getVolume",T9,T10,T11,T12,T13);
     	answer:= scheduler.pick(T9, T10, T11, T12, T13);

The scheduler.choice() operator calls a builtin getVolume for each target variable and expects an integer-valued cost. In this case it returns the total number of bytes uses as arguments.

The target variable with the lowest cost is chosen for execution and remaining variables are turned into a temporary NOOP operation.(You may want to re-use the memo) They are skipped by the interpreter, but also in subsequent calls to the scheduler. It reduces the alternatives as we proceed in the plan.

A built-in naive cost function is used. It would be nice if the user could provide a private cost function defined as a pattern with a polymorphic argument for the target and a :lng result. Its implementation can use the complete context information to make a decision. For example, it can trace the potential use of the target variable in subsequent statements to determine a total cost when this step is taken towards the final result.

A complete plan likely includes other expressions to prepare or use the target variables before reaching the next choice point. It is the task of the choice operator to avoid any superfluous operation.

The MAL block should be privately owned by the caller, which can be assured with scheduler.isolation().

A refinement of the scheme is to make cost analysis part of the plan as well. Then you don't have to include a hardwired cost function.

     	Acost:= aggr.count(A);
     	Bcost:= aggr.count(B);
     	Ccost:= aggr.count(C);
     	T1cost:= Acost+Bcost;
     	T2cost:= Bcost+Ccost;
     	T3cost:= Ccost+Dcost;
     	scheduler.choice(T1cost,T1, T2cost,T2, T3cost,T3);
     	T1:= algebra.join(A,B);
     	T2:= algebra.join(B,C);
     	T3:= algebra.join(C,D);
     	...

Merge tables

5.3.1 Merge Tables

A merge association table (MAT) descriptor defines an ordered collection of type compatible BATs, whose union represents a single (virtual) BAT. The MAT may represent a partitioned BAT (see BPM), but could also be an arbitrary collection of temporary BATs within a program fragment.

The MAT definition lives within the scope of a single block. The MAT optimizer simply expands the plan to deal with its components on an instruction basis. Only when a blocking operator is encounted, the underlying BAT is materialized.

The MAT object cannot be passed as an argument to any function without first being materialized. Simply because the MAT is not known by the type system and none of the lower level operations are aware of its existence.

In the first approach of the MAT optimizer we assume that the first BAT in the MAT sequence is used as an accumulator. Furthermore, no semantic knowledge is used to reduce the possible superflous (semi)joins. Instead, we limit expansion to a single argument. This is changed at a later stage when a cost-based evaluation can be used to decide different.

To illustrate, consider:

     	m0:= bat.new(:oid,:int);
     	m1:= bat.new(:oid,:int);
     	m2:= bat.new(:oid,:int);
     	b := mat.new(m0,m1,m2);
     	s := algebra.select(b,1,3);
     	i := aggr.count(s);
     	io.print(s);
     	io.print(i);
     	c0 := bat.new(:int,:int);
     	c1 := bat.new(:int,:int);
     	c := mat.new(c0,c1);
     	j := algebra.join(b,c);
     	io.print(j);

The selection and aggregate operations can simply be rewritten using a MAT:

     	_33 := algebra.select(m0,1,3);
     	_34 := algebra.select(m1,1,3);
     	_35 := algebra.select(m2,1,3);
         s := mat.new(_33,_34,_35);
         i := 0:int;
         _36 := aggr.count(_33);
         i := calc.+(i,_36);
         _37 := aggr.count(_34);
         i := calc.+(i,_37);
         _38 := aggr.count(_35);
         i := calc.+(i,_38);
         io.print(i);

The print operation does not have MAT semantics yet. It requires a function that does not produce the header with each call. Instead, we can also pack the elements before printing.

         s := mat.pack(_33,_34,_35);
         io.print(s);

For the join we have to generate all possible combinations, not knowing anything about the properties of the components. The current heuristic is to limit expansion to a single argument. This leads to

         b := mat.pack(m0,m1,m2);
         _39 := algebra.join(b,c0);
         _40 := algebra.join(b,c1);
         j := mat.new(_39,_40);

The drawback of the scheme is the potential explosion in MAL statements. A challenge of the optimizer is to find the minimum by inspection of the properties of the MAT elements. For example, it might attempt to partially pack elements before proceding. This would be a runtime scheduling decision.

Alternatively, the system could use MAT iterators to avoid packing at the cost of more complex program analysis afterwards.

     	ji:= bat.new(:oid,:int);
     barrier b:= mat.newIterator(m0,m1,m2);
     barrier c:= mat.newIterator(c0,c1);
     	ji := algebra.join(b,c);
     	bat.insert(j,ji);
     	redo c:= mat.newIterator(c0,c1);
     	redo b:= mat.newIterator(m0,m1,m2);
     exit c;
     exit b;

Multiplex functions

5.3.2 Multiplex Compilation

The MonetDB operator multiplex concept has been pivotal to easily apply any scalar function to elements in a containers. Any operator cmd came with its multiplex variant [cmd]. Given the signature cmd(T1,..,Tn) : Tr, it could be applied also as [CMD](bat[:any_1,:T1],...,bat[any_1,Tn]) :bat[any_1,Tr].

The semantics of the multiplex is to perform the positional join on all bat-valued parameters, and to execute the CMD for each combination of matching tuples. All results are collected in a result BAT. All but one argument may be replaced by a scalar value.

The generic solution to the multiplex operators is to translate them to a MAL loop. A snippet of its behaviour:

    b:= bat.new(:int,:int);
    bat.insert(b,1,1);
    c:bat[:int,:int]:= mal.multiplex("calc.+",b,1);
	optimizer.multiplex();

The current implementation requires the target type to be mentioned explicitly. The result of the optimizer is:

    b := bat.new(:int,:int);
    bat.insert(b,1,1);
    _8 := bat.new(:int,:int);
barrier (_11,_12,_13):= bat.newIterator(b);
    _15 := calc.+(_13,1);
    bat.insert(_8,_12,_15);
    redo (_11,_12,_13):= bat.hasMoreElements(b);
exit (_11,_12,_13);
    c := _8;

Partitions

5.3.3 BAT Partitions

Limitations on the addressing space in older PCs and the need for distributed storage makes that BATs ideally should be looked upon as a union of smaller BATs which are processed within the (memory) resource limitations given.

The partition() optimizer with the supportive bat partition library bpm addresses the issue with an adaptive database segmentation algorithm. It is designed incrementally with a focus on supporting the SQL front-end. In particularly, the operators considered is a limited subset of MAL. Occurrence of an operator outside this set terminates the optimizer activities.

The operation optimizer.partitions() hunts for bindings of SQL column BATs and prepare code for using partitioned versions instead.

We use two implementations. The first one attempts to find segments of linear dependent data and builds an iterator around it. This approach is tricky, because you have to take care of special cases. In particular, the semantics of the operators on the sequence construction posed quite some problems.

The naive() approach simply looks at individual operations and surround them with an iterator. An alias table is kept around for re-use and detect already partitioned operands. The drawback is that potentially a partitioned BAT is read multiple times [it depends on the re-use of variables, which can be calculated] and write+read of intermediates. Experiments should demonstrate the optimal one.

Peep hole

5.3.4 Peephole optimization

Recursive descend query compilers easily miss opportunities for better code generation, because limited context is retained or lookahead available. The peephole optimizer is built around such recurring patterns and compensates for the compilers 'mistakes'. The collection of peephole patterns should grow over time and front-end specific variations are foreseen.

The SQL frontend heavily relies on a pivot table, which is a generated oid sequence. Unfortunately, this is not seen and the pattern '$i := calc.oid(0@0); $j:= algebra.markT($k,$i);' occurs often. This can be replaced with '$j:= algebra.markT($k)';

Another example of a 2-way instruction sequence produced is then '$j:= algebra.markT($k); $l:= bat.reverse($j);', which can be replaced by '$l:= algebra.markH($k);'.

The reverse-reverse operation also falls into this category. Reversal pairs may result from the processing scheme of a front-end compiler or from a side-effect from other optimization steps. Such reversal pairs should be removed as quickly as possible, so as to reduce the complexity of finding alternative optimization opportunities. As in all cases we should ensure that the intermediates dropped are not used for other purposes as well.

	r:bat[:int,:int]:= bat.new(:int,:int);
	o:= calc.oid(0@0);
	z:= algebra.markT(r,o);
	rr:= bat.reverse(z);
	s := bat.reverse(r);
	t := bat.reverse(s);
	io.print(t);
	optimizer.peephole();

which is translated by the peephole optimizer into:

	r:bat[:int,:int] := bat.new(:int,:int);
	rr := algebra.markH(r);
	io.print(r);

Another example is the combination of a BAT partition operation followed by a re-construction without using the partitions individually.

Query plans

5.3.5 Query Execution Plans

A commonly used data structure to represent and manipulate a query is a tree (or graph). Its nodes represent the operators and the leaves the operands. Such a view comes in handy when you have to re-organize whole sections of code or to built-up an optimized plan bottom up, e.g. using a memo structure.

The MAL optimizer toolkit provides functions to overlay the body of any MAL block with a tree (graph) structure and to linearize them back into a MAL block. The linearization order is determined by a recursive descend tree walk from the anchor points in the source program.

To illustrate, consider the code block:

     #T1:= bat.new(:int,:int);
     #T2:= bat.new(:int,:int);
     #T3:= bat.new(:int,:int);
     #T4:= bat.new(:int,:int);
     a:= algebra.select(T1,1,3);
     b:= algebra.select(T2,1,3);
     c:= algebra.select(T3,0,5);
     d:= algebra.select(T4,0,5);
     e:= algebra.join(a,c);
     f:= algebra.join(b,d);
     h:= algebra.join(e,f);
     optimizer.dumpQEP();

which produces an indented structure of the query plan.

     h := algebra.join(e,f);
         e := algebra.join(a,c);
             a := algebra.select(T1,1,3);
                 T1 := bat.new(:int,:int);
             c := algebra.select(T3,0,5);
                 T3 := bat.new(:int,:int);
         f := algebra.join(b,d);
             b := algebra.select(T2,1,3);
                 T2 := bat.new(:int,:int);
             d := algebra.select(T4,0,5);
                 T4 := bat.new(:int,:int);

Any valid MAL routine can be overlayed with a tree (graph) view based on the flow dependencies, but not all MAL programs can be derived from a simple tree. For example, the code snippet above when interpreted as a linear sequence can not be represented unless the execution order itself becomes an operator node itself.

However, since we haven't added or changed the original MAL program, the routine qep.propagate produces the orginial program, where the linear order has priority. If, however, we had entered new instructions into the tree, they would have been placed in close proximity of the other tree nodes.

Special care is given to the flow-of-control blocks, because to produce a query plan section that can not easily be moved around. [give dot examples]

 

Range propagation

5.3.6 Range Propagation

Almost all queries are interested in a few slices of the table. If applied to a view, the query plans often contain multiple selections over the same column. They may also have fixed range arguments comming from fragmentation criteria.

The purpose of the pushranges optimizer is to minimize the number of table scans by cascading the range terms as much as possible. Useless instructions are removed from the plan.

         b := bat.new(:oid,:int);
         s1:= algebra.select(b,1,100);
         s2:= algebra.select(s1,5,95);
         s3:= algebra.select(s2,50,nil);
         s4:= algebra.select(s3,nil,75);
         optimizer.pushranges();

This lengthly sequence can be compressed into a single one:

         b := bat.new(:oid,:int);
         s1:= algebra.select(b,50,75);

A union over two range selections from a single source could also be a target.

         t1:= algebra.select(b,1,10);
         t2:= algebra.select(b,0,5);
         t3:= algebra.union(t1,t2);

would become

        t3:= algebra.select(0,10);

Recycler

5.3.7 The recycler

Query optimization and processing in off-the-shelf database systems is often still focused on individual queries. The queries are analyzed in isolation and ran against a kernel regardless opportunities offered by concurrent or previous invocations.

This approach is far from optimal and two directions to improve are explored: materialized views and (partial) result-set reuse. Materialized views are derived from query logs. They represent common sub-queries, whose materialization improves subsequent processing times. Re-use of (partial) results is used in those cases where a zooming-in or navigational application is at stake.

The Recycler optimizer and module extends this with a middle out approach. They exploit the materialize-all-intermediate approach of MonetDB by deciding to keep a hold on them as long as deemed beneficial.

The approach taken is to mark the instructions in a MAL program using the recycler optimizer call, such that their result is retained in a global recycle cache hardwired in the MAL interpreter. Instructions become subject to the Recycler if at least one of its arguments is a BAT and all others are either constants or variables already known in the Recycler.

Upon execution, the recycler is called from the inner loop of the MAL interpreter to first check for an up-to-date result to be picked up at no cost. Otherwise, it evaluates the instruction and calls upon policy functions to decide if it is worthwhile to keep.

The Recycler comes with a few policy controlling operators to experiment with its effect in concrete settings. The retain policy controls when to keep results around, the reuse policy looks after exact duplicate instructions or uses semantical knowledge on MAL instructions to detect potential reuse gain (e.g. reuse select results). And finally, the cache policy looks after the storage space for the intermediate result pool. The details are described in the recycle module.

     pattern optimizer.recycle():str
     address OPTrecycle;
     pattern optimizer.recycle(mod:str, fcn:str):str
     address OPTrecycle
     comment "Replicator code injection";
     

Remote actions

5.3.8 Remote Queries

MAL variables may live at a different site from where they are used. In particular, the SQL front-end uses portions of remote BATs as replication views. Each time such a view is needed, the corresponding BAT is fetched and added to the local cache.

Consider the following snippet produced by a query compiler,

mid:= mapi.reconnect("s0_0","localhost",50000,"monetdb","monetdb","mal");
b:bat[:oid,:int] := mapi.bind(mid,"rvar");
c:=algebra.select(b,0,12);
io.print(c);
d:=algebra.select(b,5,10);
low:= 5+1;
e:=algebra.select(d,low,7);
i:=aggr.count(e);
io.printf(" count %d\n",i);
io.print(d);

which uses a BAT

rvar

stored at the remote site

db1

.

There are several options to execute this query. The remote BAT can be fetched as soon as the bind operation is executed, or a portion can be fetched after a remote select, or the output for the user could be retrieved. An optimal solution depends on the actual resources available at both ends and the time to ship the BAT.

The remote query optimizer assumes that the remote site has sufficient resources to handle the instructions. For each remote query it creates a private connection. It is re-used in subsequent calls .

The remote environment is used to execute the statements. The objects are retrieved just before they are locally needed.

mid:= mapi.reconnect("s0_0","localhost",50000,"monetdb","monetdb","mal");
mapi.rpc(mid,"b:bat[:oid,:int] :=bbp.bind(\"rvar\");");
mapi.rpc(mid,"c:=algebra.select(b,0,12);");
c:bat[:oid,:int]:= mapi.rpc(mid, "io.print(c);");
io.print(c);
mapi.rpc(mid,"d:=algebra.select(b,5,10);");
low:= 5+1;
mapi.put(mid,"low",low);
mapi.rpc(mid,"e:=algebra.select(d,low,7);");
mapi.rpc(mid,"i:=aggr.count(d);");
i:= mapi.rpc(mid,"io.print(i);");
io.printf(" count %d\n",i);
io.print(d);

To reduce the number of interprocess communications this code can be further improved by glueing the instructions together when until the first result is needed.

Singleton sets

5.3.9 Singleton Set Reduction

Application semantics and precise cost analysis may identify the result of an operation to produce a BAT with a single element. Such variables can be tagged with the property singleton, whereafter the operation optimizer.singleton() derives an MAL program using a symbolic evaluation as far as possible.

During its evaluation, more singleton sets can be created, leading to a ripple effect through the code. A non-optimizable instruction leads to a construction of a new table with the single instance.

 

	b:= bat.new(:int,:int);
	bat.insert(b,1,2);
	c{singleton}:= algebra.select(b,0,4);
	d:= algebra.markH(c);
	io.print(d);
	optimizer.singleton();

is translated by into the code block

	b := bat.new(:int,:int);
	bat.insert(b,1,2);
	c{singleton} := algebra.select(b,0,4);
	(_15,_16):= bat.unpack(c{singleton});
	d := bat.pack(nil,_16);
	io.print(d);

Stack reduction

5.3.10 Stack Reduction

The compilers producing MAL may generate an abundance of temporary variables to hold the result of expressions. This leads to a polution of the runtime stack space, because space should be allocated and garbage collection tests should be performed.

The routine optimizer.reduce() reduces the number of scratch variables to a minimum. All scratch variables of the same underlying type share the storage space. The result of this optimizer can be seen using the MonetDB debugger, which marks unused variables explicitly. Experience with the SQL front-end shows, that this optimization step easily reduces the stack consumption by over 20%.

This optimizer needs further testing. Furthermore, the other optimizers should be careful in setting the isused property, or this property can again be easily derived.

Strength reduction

5.3.11 Strength Reduction

An effective optimization technique in compiler construction is to move invariant statements out of the loops. The equivalent strategy can be applied to the guarded blocks in MAL programs. Any variable introduced in a block and assigned a value using a side-effect free operation is a candidate to be moved. Furthermore, it may not be used outside the block and the expression may not depend on variables assigned a value within the same block.

	j:= "hello world";
barrier go:=true;
	i:= 23;
	j:= "not moved";
	k:= j; 
	io.print(i);
	redo go:= false;
exit go;
	z:= j;
optimizer.strengthReduction();

which is translated into the following code:

	j := "hello world";
	i := 23;
barrier go := true;
	j := "not moved";
	k := j;
	io.print(i);
	redo go:= false;
exit go;
	z:= j;

Application is only applicable to loops and not to guarded blocks in general, because execution of a statement outside the guarded block consumes processing resources which may have been prohibited by the block condition.

For example, it doesn't make sense to move creation of objects outside the barrier.

MAL modules

This section contains a synopsis of the modules being shipped and which use knowledge of the MAL runtime context. They are sorted by module name and repetitive reading may be required to understand all details.

Module Loading

The server is bootstrapped by processing a MAL script with module definitions or extensions. For each module file encountered, the object library lib_<modulename>.so is searched for in the location identified by monet_mod_path=exec_prefix/lib/MonetDB5:exec_prefix/lib/MonetDB5/lib:exec_prefix/lib/MonetDB5/bin.

The corresponding signature are defined in .../lib(64)/<modulename>.mal.

The default bootstrap script is called .../lib/MonetDB5/mal_init.mal and it is designated in the configuration file as the mal_init property. The rationale for this set-up is that database administrators can extend/overload the bootstrap procedure without affecting the software package being distributed. It merely requires a different direction for the mal_init property. The scheme also isolates the functionality embedded in modules from inadvertise use on non-compliant databases.

Unlike previous versions of MonetDB, modules can not be unloaded. Dynamic libraries are always global and, therefore, it is best to load them as part of the server initialization phase.

Module file loading

The default location to search for the module is in monet_mod_path unless an absolute path is given. Loading further relies on the Linux policy to search for the module location in the following order: 1) the colon-separated list of directories in the user's LD_LIBRARY_PATH, 2) the libraries specified in /etc/ld.so.cache and 3) /usr/lib followed by /lib. If the module contains a routine _init, then that code is executed before the loader returns. Likewise the routine _fini is called just before the module is unloaded.

A module loading conflict emerges if a function is redefined. A duplicate load is simply ignored by keeping track of modules already loaded.

BAT extensions

The kernel libraries are unaware of the MAL runtime semantics. This calls for declaring some operations in the MAL module section and register them in the kernel modules explicitly.

A good example of this borderline case are BAT creation operations, which require a mapping of the type identifier to the underlying implementation type.

Another example concerns the (un)pack operations, which direct access the runtime stack to (push)pull the values needed.

pattern bat.new(ht:any_1, tt:any_2, b:bat[:any_3,:any_4]) :bat[:any_1,:any_2]
address CMDBATclone comment "Creates a new empty transient BAT by cloning another.";
pattern bat.new(ht:any_1, tt:any_2) :bat[:any_1,:any_2]
address CMDBATnew comment "Creates a new empty transient BAT, with head- and tail-types as indicated.";
pattern bat.new(ht:any_1, tt:any_2, size:int) :bat[:any_1,:any_2]
address CMDBATnewint comment "Creates a new BAT with sufficient space.";
pattern bat.new(ht:any_1, tt:any_2, size:lng) :bat[:any_1,:any_2]
address CMDBATnew comment "Creates a new BAT and allocate space.";
pattern bat.new(ht:oid, tt:any_2, size:int) :bat[:oid,:any_2]
address CMDBATnewint;
pattern bat.new(ht:oid, tt:any_2, size:lng) :bat[:oid,:any_2]
address CMDBATnew;
pattern bat.new(b:bat[:any_1,:any_2] ) :bat[:any_1,:any_2]
address CMDBATnewDerived;
pattern bat.new(b:bat[:any_1,:any_2], size:lng) :bat[:any_1,:any_2]
address CMDBATnewDerived;
command bat.new(nme:str):bat[:any_1,:any_2]
address CMDBATderivedByName comment "Localize a bat by name and produce a clone.";
command bat.reduce(b:bat[:any_1,:any_2]):bat[:any_1,:any_2]
address CMDBATreduce comment "Drop auxillary BAT structures.";
command bat.flush(b:bat[:any_1,:any_2]):void
address CMDBATflush comment "Designate a BAT as not needed anymore.";
pattern bat.setGarbage(b:bat[:any_1,:any_2]):void
address CMDBATsetGarbage comment "Designate a BAT as garbage.";
pattern bat.partition(b:bat[:any_1,:any_2]):bat[:any_1,:any_2]...
address CMDbatpartition comment "Create a series of cheap slices over the first argument. The BUNs are distributed evenly.";
pattern bat.partition(b:bat[:any_1,:any_2],pieces:int,part:int):bat[:any_1,:any_2]
address CMDbatpartition2 comment "Create a series of cheap slices over the first argument. The BUNs are distributed evenly.";
pattern bat.unpack(b:bat[:any_1,:any_2])(h:any_1,t:any_2)
address CMDbatunpack comment "Extract the first tuple from a BAT.";
pattern bat.pack(h:any_1,t:any_2):bat[:any_1,:any_2]
address CMDbatpack comment "Pack a pair of values into a BAT.";
pattern bat.setBase(b:bat[:any_1,:any_2],c:bat[:any_1,:any_2]...):void
address CMDsetBase comment "Give the non-empty BATs consecutive oid bases.";

BBP

The BBP module implements a box interface over the BAT buffer pool. It is primarilly meant to ease inspection of the BAT collection managed by the server.

The two predominant approaches to use bbp is to access the BBP with either bind or take. The former merely maps the BAT name to the object in the bat buffer pool. A more controlled scheme is to deposit, take, release and discard elements. Any BAT B created can be brought under this scheme with the name N. The association N->B is only maintained in the box administration and not reflected in the BAT descriptor. In particular, taking a BAT object out of the box leads to a private copy to isolate the user from concurrent updates on the underlying store. Upon releasing it, the updates are merged with the master copy [todo].

The remainder of this module contains operations that rely on the MAL runtime setting, but logically belong to the kernel/bat module.

module bbp;
command open():void
address CMDbbpopen comment "Locate the bbp box and open it.";
command close():void
address CMDbbpclose comment "Close the bbp box.";
command destroy():void
address CMDbbpdestroy comment "Destroy the box";
pattern take(name:str) :bat[:any_1,:any_2]
address CMDbbptake comment "Load a particular bat.";
pattern deposit(name:str,v:bat[:any_1,:any_2]) :void
address CMDbbpdeposit comment "Enter a new bat into the bbp box.";
pattern deposit(name:str,loc:str) :bat[:any_1,:any_2]
address CMDbbpbindDefinition comment "Relate a logical name to a physical BAT in the buffer pool.";
pattern commit():void
address CMDbbpReleaseAll comment "Commit updates for this client.";
pattern releaseAll():void
address CMDbbpReleaseAll comment "Commit updates for this client.";
pattern release(name:str,val:bat[:any_1,:any_2]) :void
address CMDbbprelease comment "Commit updates and release this BAT.";
pattern release(b:bat[:any_1,:any_2]):void
address CMDbbpreleaseBAT comment "Remove the BAT from further consideration";
pattern destroy(b:bat[:any_1,:any_2]):void
address CMDbbpdestroyBAT1 comment "Schedule a BAT for removal at session end.";
pattern destroy(b:bat[:any_1,:any_2],immediate:bit)
address CMDbbpdestroyBAT comment "Schedule a BAT for removal at session end or immediately.";
pattern toString(name:str):str
address CMDbbptoStr comment "Get the string representation of an element in the box.";
pattern discard(name:str):void
address CMDbbpdiscard comment "Remove the BAT from the box.";
pattern iterator(nme:str):lng
address CMDbbpiterator comment "Locate the next element in the box.";
pattern prelude():void
address CMDbbpprelude comment "Initialize the bbp box.";
pattern bind(name:str):bat[:any_1,:any_2]
address CMDbbpbind comment "Locate the BAT using its logical name";
pattern bind(head:str,tail:str):bat[:any_1,:any_2]
address CMDbbpbind2 comment "Locate the BAT using the head and tail names in the BAT buffer pool");
pattern bind(idx:int):bat[:any_1,:any_2]
address CMDbbpbindindex comment "Locate the BAT using its BBP index in the BAT buffer pool";
pattern getObjects():bat[:int,:str]
address CMDbbpGetObjects comment "View of the box content.";
command getHeadType() :bat[:int,:str]
address CMDbbpHeadType comment "Map a BAT into its head type";
command getTailType() :bat[:int,:str]
address CMDbbpTailType comment "Map a BAT into its tail type";
command getNames() :bat[:int,:str]
address CMDbbpNames comment "Map BAT into its bbp name";
command getRNames() :bat[:int,:str]
address CMDbbpRNames comment "Map a BAT into its bbp physical name";
command getName( b:bat[:any_1,:any_2]):str
address CMDbbpName comment "Map a BAT into its internal name";
command getCount() :bat[:int,:lng]
address CMDbbpCount comment "Create a BAT with the cardinalities of all known BATs";
command getRefCount() :bat[:int,:int]
address CMDbbpRefCount comment "Create a BAT with the (hard) reference counts";
command getLRefCount() :bat[:int,:int]
address CMDbbpLRefCount comment "Create a BAT with the logical reference counts";
command getLocation() :bat[:int,:str]
address CMDbbpLocation comment "Create a BAT with their disk locations";
command getHeat() :bat[:int,:int]
address CMDbbpHeat comment "Create a BAT with the heat values";
command getDirty() :bat[:int,:str]
address CMDbbpDirty comment "Create a BAT with the dirty/ diffs/clean status";
command getStatus() :bat[:int,:str]
address CMDbbpStatus comment "Create a BAT with the disk/load status";
command getKind():bat[:int,:str]
address CMDbbpKind comment "Create a BAT with the persistency status";
command getRefCount(b:bat[:any_1,:any_2]) :int
address CMDgetBATrefcnt comment "Utility for debugging MAL interpreter";
command getLRefCount(b:bat[:any_1,:any_2]) :int
address CMDgetBATlrefcnt comment "Utility for debugging MAL interpreter";
command getDiskSpace() :lng
address CMDbbpDiskSpace comment "Estimate the amount of disk space occupied by dbfarm";
command getPageSize():int
address CMDgetPageSize comment "Obtain the memory page size";

BPM

8.17 BAT Partition Manager

In real-life database applications the BATs tend to grow beyond the memory size. This leads to a heavy IO dominated behavior, which can partly be avoided by breaking up the query into a sequence of subqueries using a map-reduce strategy. The BAT partition manager (BPM) module is designed to support this strategy using range- and hash-partitioning.

Consider we want to reorganize R:bat[:oid,:int] into two partitions, based on splitting by tail value. The following MAL program illustrates the snippet of actions needed:

bpm.open(); Ralias:= bpm.deposit("myversion",R:bat[:oid,:int]); bpm.rangePartition(Ralias,nil:int,100); bpm.rangePartition(Ralias,101,200); bpm.close();

The command

bpm.deposit

registers a BAT as one for which a partitioned copy is required. The first partition call breaks the orginal BAT into two pieces: (nil:int,100) and (101,nil:int). The second call breaks the latter into (101,200) and (201,nil:int). The BAT partitions share the persistency properties. Partitioning on the head simple calls for a reverse operation on the source BAT first.

The partition manager also supports hash-based partitioning. Its argument is the number of hash bucket bits.

bpm.open(); Rev:= bat.reverse(R:bat[:oid,:int]); Ralias:= bpm.deposit("myHashVersion",Rev); # creates side effects bpm.hashPartition(Ralias,2); bpm.close();

This example creates a hash-partition based on the head.

The design is based on the assumption that partitions are reasonably large. This helps to limit plan explosion. (or a scheduler should step in)

8.17.1 Derived partitioning

A relational front-end would benefit from derived horizontal fragmentation. It would enable grouping together related fragments on the same site. Assume a relation R(A,B) which is already partitioned on A the derived fragmentation on the head is enforced with

bpm.derivePartition(B,A);

8.17.2 Using partitions

The partitioned BAT can be used in two ways. A query plan can be rewritten into a generator over the partitions, or it can be used by optimizers to derived all subqueries first for symbolic evaluation.

The former is illustrated with the snippet to select part of a partitioned BAT. In this example we collect the partial results in the accumulator BAT U.

bpm.open(); Ralias:bat[:oid,:int]:= bpm.take("myversion"); U:= bat.new(:oid,:int); barrier Rp:= bpm.newIterator(Ralias); ... t:= algebra.select(Rp,0,100); U:= algebra.union(tu,t); ... redo Rp:= bpm.hasMoreElements(Ralias); exit Rp; bpm.close();

The properties of the partitioned BATs are particularly useful during query optimization. However, it only works if the BAT identifier can be determined at compile time. For SQL it can be simply looked up in the catalog as part of a preparatory optimizer step.

To illustrate, the same problem handled by an optimizer that produces the plan based on a known number of partitions:

bpm.open(); R:bat[:oid,:int]:= bpm.take("myversion"); # get the partition alias optimizer.mergetable(); T:= algebra.select(R,0,100);

is translated to the plan:

bpm.open(); R:bat[:oid,:int]:= bpm.take("myversion"); # get the partition alias R0:bat[:oid,:int]:= bpm.take(R,0, nil:oid,nil:oid, 0,100); R1:bat[:oid,:int]:= bpm.take(R,1, nil:oid,nil:oid, 101,200); R2:bat[:oid,:int]:= bpm.take(R,2, nil:oid,nil:oid, 201,nil:int); R:= mat.new(R0,R1,R2); T:= algebra.select(R,0,100); optimizer.multitable();

In this translation Ri also gets the properties of the BATs. It is now up to the

mat

optimizer to decide about further plan expansion or an iterator approach.

8.17.3 Partition updates

The content of the partitions is preferrable updated in bulk. This calls for accumulation of insertions/deletions in pending update BATs, as already performed in the SQL code generator. Once the transaction is commited, the updates are propagated (in parallel) to all partitions.

bpm.open(); Ralias:bat[:oid,:int] := bpm.take("myversion"); bpm.insert(Ralias, Rinsert);# handle pending inserts bpm.delete(Ralias, Rdelete);# handle pending deletes bpm.replace(Ralias, Rold, Rnew);# handle pending updates bpm.close();

The

replace

operator works on the assumption that the head of

Rold

and

Rnew

is unique.

It remains possible to retrieve a partition and directly insert elements, but then it is up to the compiler to ensure that the boundery conditions are met.

8.17.4 Partitioned results

In many situations, you would like to keep the partial results as a partitioned BAT again. The easiest solution is to create a partitioned BAT, whose partitions are empty. Subsequently, we insert the temporary results. Depending on the fragmentation criteria, pieces may align with the pieces known, or lead to a redistribution of the buns to the correct bats.

In the previous plan for this becomes

bpm.open(); Tmp := bpm.deposit("tmp",:bat[:oid,:int]); bpm.rangePartition(tmp,nil:int,100); bpm.rangePartition(tmp,101,nil:int);

Ralias:bat[:oid,:int]:= bpm.take("myversion"); # get the partition alias R0:bat[:oid,:int]:= bpm.take("myversion", nil:oid,nil:oid, 0,100); T0:= algebra.select(R0,0,100); bpm.insert(Tmp,T0);

R1:bat[:oid,:int]:= bpm.take("myversion", nil:oid,nil:oid, 101,200); T1:= algebra.select(R1,0,100); bpm.insert(Tmp,T1);

R2:bat[:oid,:int]:= bpm.take("myversion", nil:oid,nil:oid, 201,nil:int); T2:= algebra.select(R2,0,100); bpm.insert(Tmp,T2);

Note that a symbolic optimizer can reduce this plan to a small snippet.

The rationale for the update approach is that re-distribution of temporary results are hidden behind the bpm.insert() interface. The only decision that should be taken by the optimizer is the fragmentation criteria for the temporary results.

For temporary results the range bounds need not be stored in the BPM catalog. Instead, the mat approach could be used to reduce the plan size.

bpm.open(); Ralias:= bpm.take("myversion",:bat[:oid,:int]); # get the partition alias R0:= bpm.take(Ralias, 0); T0:=algebra.select(R0,0,100);

R1:= bpm.take(Ralias, 1); T1:= algebra.select(R1,0,100);

R2:= bpm.take(Ralias, 2); R:= mat.new(T0,T1,T2); T2:=algebra.select(R2,0,100);

8.17.5 Partition iterators

The default strategy for an optimizer is to replace a reference to a partitioned BAT by an iterator.

l:= bpm.new(); barrier Elm:bat[:oid,:int]:= bpm.newIterator(Ralias); t:= algebra.select(Elm,0,20); bpm.addPartition(l,t); redo Elm:bat[:oid,:int]:= bpm.newIterator(Ralias); exit Elm;

Variations on this theme are iterators that search for partitions overlapping a range or those that are not empty.

8.17.6 Partition selection

Partition aware relational operators further reduce the performance overhead and at the same time avoid cluttering the MAL plans with too much flow of control constructs. A few operators relevant for the SQL environment will be added.

The select operation can be overloaded in the BPM to improve processing further. For example, the operation

t := bpm.select(Ralias,0,100);

extracts portions of all three partitions and creates a non-partitioned result BAT. If the partition bounds align with the selection criteria this operation becomes cheap. It can be used to convey information on the bounds to optimizers.

The lifetime of a partitioned table is inherited from its components. How to detect that a temporary BAT is removed from the BBP? Currently we have to explicitly call the bpm.garbage() on those partitioned BATs.

At the end of a query plan we have to garbage collect any of the left-over partitioned temporary tables.

Box

8.7 Box definitions

This module shows the behavior of a simple box of objects. Objects are stored into the box using deposit and taken out with take. Once you are done, elements can be removed by name or reference using discard.

A box should be opened before being used. It is typically used to set-up the list of current users and to perform authorization.

module box;
pattern open(nme:str):any_1
address BOXopen comment "Locate the box and open it.";
pattern close(bname:str):void
address BOXclose comment "Close the box.";
pattern destroy(bname:str):void
address BOXdestroy comment "Destroy the box.";
pattern take(bnme:str, vnme:str):any_1
address BOXtake comment "Locate the typed value in the box.";
pattern deposit(bname:str,name:str,v:any_1):void
address BOXdeposit comment "Enter a new value into the box.";
pattern releaseAll(bname:str) :void
address BOXreleaseAll comment "Release all objects for this client.";
pattern release(bname:str,nme:str,val:any_1):void
address BOXrelease comment "Release the BAT from the client pool.";
pattern toString(bname:str,name:str) :str
address BOXtoString comment "Get the string representation of the i-th element in the box.";
pattern discard(bname:str,name:str) :void
address BOXdiscard comment "Release the BAT from the client pool.";
pattern iterator(nme:str):lng
address BOXiterator comment "Locates the next element in the box.";
command getBoxNames():bat[:int,:str]
address BOXgetBoxNames comment "Retrieve the names of all boxes.";

Chopper

8.6 BAT Iterators

Many low level algorithms rely on an iterator to break a collection into smaller pieces. Each piece is subsequently processed by a block.

For very large BATs it may make sense to break it into chunks and process them separately to solve a query. An iterator pair is provided to chop a BAT into fixed size elements. Each chunk is made available as a BATview. It provides read-only access to an underlying BAT. Adjusting the bounds is cheap, once the BATview descriptor has been constructed.

The smallest granularity is a single BUN, which can be used to realize an iterator over the individual BAT elements. For larger sized chunks, the operators return a BATview.

All iterators require storage space to administer the location of the next element. The BAT iterator module uses a simple lng variable, which also acts as a cursor for barrier statements.

The larger chunks produced are currently static, i.e. their size is a parameter of the call. Dynamic chunk sizes are interesting for time-series query processing. (See another module)

command bat.newIterator(b:bat[:any_1,:any_2], size:lng)
(:lng,:bat[:any_1,:any_2]) address CHPnewChunkIterator comment "Create an iterator with fixed granule size. The result is a view.";
command bat.hasMoreElements(b:bat[:any_1,:any_2], size:lng)
(:lng, :bat[:any_1,:any_2]) address CHPhasMoreElements comment "Produce the next chunk for processing.";
pattern bat.newIterator(b:bat[:any_1,:any_2]) (:lng, h:any_1, t:any_2)
address CHPbunIterator comment "Process the buns one by one extracted from a void table.";
pattern bat.newIterator(b:bat[:any_1,:bat]) (:lng, h:any_1, t:any_2)
address CHPbunIterator comment "Process the buns one by one extracted from a void table.";
pattern bat.hasMoreElements(b:bat[:any_1,:any_2]) (:lng, h:any_1, t:any_2)
address CHPbunHasMoreElements;
pattern bat.hasMoreElements(b:bat[:oid,:any_2]) (:lng, h:oid, t:any_2)
address CHPbunHasMoreElements comment "Produce the next bun for processing.";
pattern bat.hasMoreElements(b:bat[:any_1,:bat]) (:lng, h:any_1, t:any_2)
address CHPbunHasMoreElements comment "Produce the next bun for processing.";

The head and tail values can also be extracted using the cursor. It points to the first bun in the chunk under consideration. It is often more effective due to use the iterator with automatic extraction of head and tail value; the overhead involved is much less.

pattern bat.getHead(b:bat[:any_1,:any],i:lng):any_1
address CHPgetHead comment "return the BUN head value using the cursor.";
pattern bat.getTail(b:bat[:any_2,:any_1],i:lng):any_1
address CHPgetTail comment "return the BUN tail value using the cursor.";

Clients

8.8 Client Management

Each online client is represented with an entry in the clients table. The client may inspect his record at run-time and partially change its properties. The administrator sees all client records and has the right to adjust global properties.

module clients;
pattern setListing(flag:int):int
address CLTsetListing comment "Turn on/off echo of MAL instructions: 2 - show mal instruction, 4 - show details of type resolutoin, 8 - show binding information.";
pattern setHistory(s:str)
address CLTsetHistory comment "Designate console history file for readline.";
pattern getId():int
address CLTgetClientId comment "Return a number that uniquely represents the current client.";
pattern getInfo( ):bat[:str,:str]
address CLTInfo comment "Pseudo bat with client attributes.";
pattern getScenario():str
address CLTgetScenario comment "Retrieve current scenario name.";
pattern setScenario(msg:str):str
address CLTsetScenario comment "Switch to other scenario handler, return previous one.";
pattern quit():void
address CLTquit comment "Terminate the client session.";
pattern quit(idx:int):void
address CLTquit comment "Terminate the session for a single client using a soft error. It is the privilige of the console user.";

Administrator operations

command getLogins( ):bat[:int,:str]
address CLTLogin comment "Pseudo bat of client login time.";
command getLastCommand( ):bat[:int,:str]
address CLTLastCommand comment "Pseudo bat of client's last command time.";
command getActions( ):bat[:int,:int]
address CLTActions comment "Pseudo bat of client's command counts.";
command getTime( ):bat[:int,:lng]
address CLTTime comment "Pseudo bat of client's total time usage(in usec).";
command getUsers( ):bat[:int,:str]
address CLTusers comment "Pseudo bat of users logged in.";
pattern stop(id:int)
address CLTstop comment "Stop the query execution at the next eligble statement.";
pattern suspend(id:int):void
address CLTsuspend comment "Put a client process to sleep for some time. It will simple sleep for a second at a time, until the awake bit has been set in its descriptor";
command wakeup(id:int):void
address CLTwakeup comment "Wakeup a client process";
pattern setTimeout(q:int,s:int):void
address CLTsetTimeout comment "Abort a query after q seconds (q=0 means run undisturbed). The session timeout aborts the connection after spending too many seconds on query processing.";
pattern getTimeout()(q:int,s:int)
address CLTgetTimeout comment "A query is aborted after q seconds (q=0 means run undisturbed). The session timeout aborts the connection after spending too many seconds on query processing.";
command shutdown(forced:bit):void
address CLTshutdown comment "Close all client connections. If forced=false the clients are moved into FINISHING mode, which means that the process stops at the next cycle of the scenario. If forced=true all client processes are immediately killed";

Constants

8.5 Constants

The const module provides a box abstraction store for global constants. Between sessions, the value of the constants is saved on disk in the form of a simple MAL program, which is scanned and made available by opening the box. A future implementation should provide transaction support over the box, which would permit multiple clients to exchange (scalar) information easily.

The default constant box is initialized with session variables, such as 'user','dbname', 'dbfarm', and 'dbdir'. These actions are encapsulated in the prelude routine called.

A box should be opened before being used. It is typically used to set-up the list of current users and to perform authorization. The constant box is protected with a simple authorization scheme, prohibiting all updates unless issued by the system administrator.

module const;
pattern open():void
address CSTopen comment "Locate and open the constant box.";
pattern close():void
address CSTclose comment "Close the constant box.";
pattern destroy():void
address CSTdestroy comment "Destroy the box.";
pattern take(name:str):any_1
address CSTtake comment "Take a variable out of the box.";
pattern deposit(name:str,val:any_1) :void
address CSTdeposit comment "Add a variable to the box.";
pattern releaseAll():void
address CSTreleaseAll comment "Release all variables in the box.";
pattern release(name:str) :void
address CSTrelease comment "Release a constant value.";
pattern release(name:any_1):void
address CSTrelease comment "Release a constant value.";
pattern toString(name:any_1):str
address CSTtoString comment "Get the string representation of an element in the box.";
pattern discard(name:any_1) :void
address CSTdiscard comment "Release the const from the box.";
pattern newIterator()(:lng,:str)
address CSTnewIterator comment "Locate next element in the box.";
pattern hasMoreElements()(:lng,:str)
address CSThasMoreElements comment "Locate next element in the box.";

Debugger

8.13 MAL debugger interface

This module provides access to the functionality offered by the MonetDB debugger and interpreter status. It is primarilly used in interactive sessions to activate the debugger at a given point. Furthermore, the instructions provide the necessary handle to generate information for post-mortum analysis.

To enable ease of debugging and performance monitoring, the MAL interpreter comes with a hardwired gdb-like text-based debugger. A limited set of instructions can be included in the programs themselves, but beware that debugging has a global effect. Any concurrent user will be affected by breakpoints being set.

The prime scheme to inspect the MAL interpreter status is to use the MAL debugger directly. However, in case of automatic exception handling it helps to be able to obtain BAT versions of the critical information, such as stack frame table, stack trace, and the instruction(s) where an exception occurred. The inspection typically occurs in the exception handling part of the MAL block.

Beware, a large class of internal errors can not easily captured this way. For example, bus-errors and segmentation faults lead to premature termination of the process. Similar, creation of the post-mortum information may fail due to an inconsistent state or insufficient resources.

module mdb;
pattern start():void
address MDBstart comment "Start interactive debugger";
pattern start(clientid:int):void
address MDBstart comment "Start interactive debugger on a client";
pattern start(mod:str,fcn:str):void
address MDBstartFactory comment "Start interactive debugger on a running factory";
pattern stop():void
address MDBstop comment "Stop the interactive debugger";
pattern inspect(mod:str,fcn:str):void
address MDBinspect comment "Run the debugger on a specific function";
command modules():bat[:int,:str]
address CMDmodules comment "List available modules";
pattern setTrap(mod:str, fcn:str, b:bit):void
address MDBtrapFunction comment "Suspend upon a call to the MAL function.";
pattern setTrap(idx:int):void
address mdbTrapClient comment "Call debugger for a specific process.";
pattern setTrace(b:bit):void
address MDBsetTrace comment "Turn on/off tracing of current routine";
pattern setTrace(b:str):void
address MDBsetVarTrace comment "Turn on/off tracing of a variable ";
pattern setCatch(b:bit):void
address MDBsetCatch comment "Turn on/off catching exceptions";
pattern setThread(b:bit):void
address MDBsetThread comment "Turn on/off thread identity for debugger";
pattern setTimer(b:bit):void
address MDBsetTimer comment "Turn on/off performance timer for debugger";
pattern setMemoryTrace(b:bit):void
address MDBsetBigfoot comment "Turn on/off memory foot print tracer for debugger";
pattern setFlow(b:bit):void
address MDBsetFlow comment "Turn on/off memory flow debugger";
pattern setMemory(b:bit):void
address MDBsetMemory comment "Turn on/off memory statistics tracing.";
pattern setIO(b:bit):void
address MDBsetIO comment "Turn on/off io statistics tracing";
pattern setCount(b:bit):void
address MDBsetCount comment "Turn on/off bat count statistics tracing";
command getDebug():int
address MDBgetDebug comment "Get the kernel debugging bit-set. See the MonetDB configuration file for details";
command setDebug(flg:str):int
address MDBsetDebugStr comment "Set the kernel debugging bit-set and return its previous value. The recognized options are: threads, memory, properties, io, transactions, modules, algorithms, estimates, xproperties";
command setDebug(flg:int):int
address MDBsetDebug comment "Set the kernel debugging bit-set and return its previous value.";
command getException(s:str):str
address MDBgetExceptionVariable comment "Extract the variable name from the exception message";
command getReason(s:str):str
address MDBgetExceptionReason comment "Extract the reason from the exception message";
command getContext(s:str):str
address MDBgetExceptionContext comment "Extract the context string from the exception message";
pattern list():void
address MDBlist comment "Dump the current routine on standard out.";
pattern listMapi():void
address MDBlistMapi comment "Dump the current routine on standard out with Mapi prefix.";
pattern list(M:str,F:str):void
address MDBlist3 comment "Dump the routine M.F on standard out.";
pattern List():void
address MDBlistDetail comment "Dump the current routine on standard out.";
pattern List(M:str,F:str):void
address MDBlist3Detail comment "Dump the routine M.F on standard out.";
pattern var():void
address MDBvar comment "Dump the symboltable of current routine on standard out.";
pattern var(M:str,F:str):void
address MDBvar3 comment "Dump the symboltable of routine M.F on standard out.";
pattern lifespan(M:str,F:str):void
address MDBlifespan comment "Dump the current routine lifespan information on standard out.";
pattern grab():void
address mdbGrab comment "Call debugger for a suspended process.";
pattern trap():void
address mdbTrap comment "A suspended process for debugging.";
pattern dot(M:str,F:str,s:str):void
address MDBshowFlowGraph comment "Dump the data flow of the function M.F in a format recognizable by the command 'dot' on the file s";
pattern getStackDepth():int
address MDBStkDepth comment "Return the depth of the calling stack.";
pattern getStackFrame(i:int):bat[:str,:str]
address MDBgetStackFrameN;
pattern getStackFrame():bat[:str,:str]
address MDBgetStackFrame comment "Collect variable binding of current (n-th) stack frame.";
pattern getStackTrace():bat[:void,:str]
address MDBStkTrace;
pattern dump()
address MDBdump comment "Dump instruction, stacktrace, and stack";
pattern getDefinition():bat[:void,:str]
address MDBgetDefinition comment "Returns a string representation of the current function with typing information attached";

Factories

8.9 Factory management

The factory infrastructure can be inspected and steered with the commands provided here.

module factories;
command getPlants()(mod:bat[:oid,:str], fcn:bat[:oid,:str])
address FCTgetPlants comment "Retrieve the names for all active factories.";
command getCaller():int
address FCTgetCaller comment "Retrieve the unique identity of the factory caller.";
command getOwners():bat[:oid,:str]
address FCTgetOwners comment "Retrieve the factory owners table.";
command getArrival():bat[:oid,:timestamp]
address FCTgetArrival comment "Retrieve the time stamp the last call was made.";
command getDeparture():bat[:oid,:timestamp]
address FCTgetDeparture comment "Retrieve the time stamp the last answer was returned.";
pattern shutdown(m:str, f:str):void
address FCTshutdown comment "Close a factory.";

I/O

8.11 Input/Output module

The IO module provides simple ascii-io rendering options. It is modeled after the tuple formats, but does not attempt to outline the results. Instead, it is geared at speed, which also means that some functionality regarding the built-in types is duplicated from the atoms definitions.

A functional limited form of formatted printf is also provided. It accepts at most one variable. A more complete approach is the tablet module.

The commands to load and save a BAT from/to an ASCII dump are efficient, but work only for binary tables.

module io;
pattern stdin():bstream
address io_stdin comment "return the input stream to the database client";
pattern stderr():streams
address io_stderr comment "return the error stream for the database console";
pattern stdout():streams
address io_stdout comment "return the output stream for the database client";
pattern print(val:any_1,lst:any...):void
address IOprint_val comment "Print a MAL value tuple .";
pattern print(b1:bat[:any_1,:any]...):void
address IOtable comment "BATs are printed with '#' for legend lines, and the BUNs on seperate lines between brackets, containing each to comma separated values (head and tail). If multiple BATs are passed for printing, print() performs an implicit natural join, producing a multi attribute table.";
pattern ftable( filep:streams, b1:bat[:any_1,:any], b:bat[:any_1,:any]... ):void
address IOftable comment "Print an n-ary table to a file.";
pattern print(order:int,b:bat[:any_1,:any], b2:bat[:any_1,:any]...):void
address IOotable comment "The same as normal table print, but enforces to use the order of BAT number [1..argc] to do the printing.";
pattern table(b1:bat[:any_1,:any], b2:bat[:any_1,:any]...):void
address IOttable comment "Print an n-ary table. Like print, but does not print oid column";
pattern table(order:int, b1:bat[:any_1,:any], b2:bat[:any_1,:any]...):void
address IOtotable comment "Print an n-ary table.";
pattern ftable(fp:streams, order:int, b1:bat[:any_1,:any], b:bat[:any_1,:any]...):void
address IOfotable comment "Print an n-ary table to a file.";
pattern print(val:any_1):void
address IOprint_val comment "Print a MAL value tuple .";
pattern print(val:bat[:any_1,:any_2]):void
address IOprint_val comment "Print a MAL value tuple .";
pattern prompt(val:any_1):void
address IOprompt_val comment "Print a MAL value without brackets.";
pattern printf(fmt:str,val:any...):void
address IOprintf comment "Select default format ";
pattern printf(fmt:str):void
address IOprintf comment "Select default format ";
pattern printf(filep:streams,fmt:str,val:any...):void
address IOprintfStream comment "Select default format ";
pattern printf(filep:streams,fmt:str):void
address IOprintfStream comment "Select default format ";
command data(fname:str):str
address IOdatafile comment "Signals receipt of tuples in a file fname. It returns the name of the file, if it still exists.";
command export(b:bat[:any_1,:any_2], filepath:str):bit
address IOexport comment "Export a BAT as ASCII to a file. If the 'filepath' is not absolute, it is put into the .../dbfarm/$DB directory. Success of failure is indicated.";
command import(b:bat[:any_1,:any_2], filepath:str) :bat[:any_1,:any_2]
address IOimport comment "Import a BAT from an ASCII dump. The new tuples are *inserted* into the parameter BAT. You have to create it! Its signature must match the dump, else parsing errors will occur and FALSE is returned.";

Inspect

8.10 Inspection

This module introduces a series of commands that provide access to information stored within the interpreter data structures. It's primary use is debugging. In all cases, the pseudo BAT operation is returned that should be garbage collected after being used.

The main performance drain would be to use a pseudo BAT directly to successively access it components. This can be avoided by first assigning the pseudo BAT to a variable.

module inspect;
command getWelcome():str
address INSPECTgetWelcome comment "Return the server message of the day string";
pattern getDefinition(mod:str,fcn:str) :bat[:str,:str]
address INSPECTgetDefinition comment "Returns a string representation of a specific function.";
pattern getSignature(mod:str,fcn:str) :bat[:str,:str]
address INSPECTgetSignature comment "Returns the function signature(s).";
pattern getAddress(mod:str,fcn:str) :bat[:str,:str]
address INSPECTgetAddress comment "Returns the function signature(s).";
pattern getComment(mod:str,fcn:str) :bat[:str,:str]
address INSPECTgetComment comment "Returns the function help information.";
pattern getSource(mod:str,fcn:str):str
address INSPECTgetSource comment "Return the original input for a function.";
pattern getKind():bat[:oid,:str]
address INSPECTgetkind comment "Obtain the instruction kind.";
pattern getModule():bat[:oid,:str]
address INSPECTgetAllModules comment "Obtain the function name.";
pattern getFunction():bat[:oid,:str]
address INSPECTgetAllFunctions comment "Obtain the function name.";
pattern getSignatures():bat[:oid,:str]
address INSPECTgetAllSignatures comment "Obtain the function signatures.";
pattern getAddresses():bat[:oid,:str]
address INSPECTgetAllAddresses comment "Obtain the function address.";
pattern getSize():lng
address INSPECTgetSize comment "Return the storage size for the current function (in bytes).";
pattern getSize(mod:str):bat[:str,:lng]
address INSPECTgetModuleSize comment "Return the storage size for a module (in bytes).";
pattern getSize(mod:str,fcn:str):lng
address INSPECTgetFunctionSize comment "Return the storage size for a function (in bytes).";
pattern getType(v:bat[:any_1,:any_2]) (ht:str, tt:str)
address INSPECTtypeName comment "Return the concrete type of a variable (expression).";
pattern getType(v:any_1) :str
address INSPECTtypeName comment "Return the concrete type of a variable (expression).";
command getTypeName(v:int):str
address INSPECTtypename comment "Get the type name associated with a type id.";
pattern getTypeIndex(v:bat[:any_1,:any_2]) (ht:int, tt:int)
address INSPECTtypeIndex comment "Return the type index of a BAT head and tail.";
pattern getTypeIndex(v:any_1):int
address INSPECTtypeIndex comment "Return the type index of a variable. For BATs, return the type index for its tail.";
pattern equalType(l:any, r:any):bit
address INSPECTequalType comment "Return true if both operands are of the same type";
command getAtomNames():bat[:int,:str]
address INSPECTatom_names comment "Collect a BAT with the atom names.";
command getAtomSuper():bat[:int,:str]
address INSPECTatom_sup_names comment "Collect a BAT with the atom names.";
command getAtomSizes():bat[:int,:int]
address INSPECTatom_sizes comment "Collect a BAT with the atom sizes.";
command getEnvironment():bat[:str,:str]
address INSPECTgetEnvironment comment "Collect the environment variables.

Language extension

8.12 Language Extensions

Iterators over scalar ranges are often needed, also at the MAL level. The barrier and control primitives are sufficient to mimic them directly.

The modules located in the kernel directory should not rely on the MAL datastructures. That's why we have to deal with some bat operations here and delegate the signature to the proper module upon loading.

Running a script is typically used to initialize a context. Therefore we need access to the runtime context. For the call variants we have to determine an easy way to exchange the parameter/return values.

module language;
command newRange(v:oid)(:bit,:oid)
address RNGnewRange_oid;
command newRange(v:sht)(:bit,:sht)
address RNGnewRange_sht;
command newRange(v:int)(:bit,:int)
address RNGnewRange_int;
command newRange(v:lng)(:bit,:lng)
address RNGnewRange_lng;
command newRange(v:flt)(:bit,:flt)
address RNGnewRange_flt;
command newRange(v:dbl)(:bit,:dbl)
address RNGnewRange_dbl comment "This routine introduces an iterator over a scalar domain.";
command nextElement(step:oid,last:oid)(:bit,:oid)
address RNGnextElement_oid;
command nextElement(step:sht,last:sht)(:bit,:sht)
address RNGnextElement_sht;
command nextElement(step:int,last:int)(:bit,:int)
address RNGnextElement_int;
command nextElement(step:lng,last:lng)(:bit,:lng)
address RNGnextElement_lng;
command nextElement(step:flt,last:flt)(:bit,:flt)
address RNGnextElement_flt;
command nextElement(step:dbl,last:dbl)(:bit,:dbl)
address RNGnextElement_dbl comment "Advances the iterator with a fixed value until it becomes >= last.";
command raise(msg:str) :str
address CMDraise comment "Raise an exception labeled with a specific message.";
command assert(v:bit,term:str):void
address MALassertBit;
command assert(v:sht,term:str):void
address MALassertSht;
command assert(v:int,term:str):void
address MALassertInt;
command assert(v:lng,term:str):void
address MALassertLng;
command assert(v:str,term:str):void
address MALassertStr;
command assert(v:oid,term:str):void
address MALassertOid;
pattern assert(v:any_1,pname:str,oper:str,val:any_2):void
address MALassertTriple comment "Assertion test.";
pattern assertSpace(depth:int)
address safeguardStack comment "Ensures that the current call does not consume more than depth*vtop elements on the stack.";
pattern dataflow():int
address MALstartDataflow comment "The current guarded block is executed using dataflow control. ";
pattern register(m:str,f:str,code:str,help:str):void
address CMDregisterFunction comment"Compile the code string and register it as a MAL function.";
pattern setMemoryTrace(flg:bit):void
address CMDsetMemoryTrace comment "Set the flag to trace the memory footprint";
pattern setThreadTrace(flg:bit):void
address CMDsetThreadTrace comment "Set the flag to trace the interpreter threads";
pattern setTimerTrace(flg:bit):void
address CMDsetTimerTrace comment "Set the flag to trace the execution time";
pattern setIOTrace(flg:bit):void
address CMDsetIOTrace comment "Set the flag to trace the IO";
pattern call(s:str):void
address CMDcallString comment "Evaluate a MAL string program.";
pattern call(s:bat[:oid,:str]):void
address CMDcallBAT comment "Evaluate a program stored in a BAT.";
pattern source(f:str):void
address CMDevalFile comment "Merge the instructions stored in the file with the current program.";

MAPI

8.15 MAPI interface

The complete Mapi library is available to setup communication with another Mserver.

Clients may initialize a private listener to implement specific services. For example, in an OLTP environment it may make sense to have a listener for each transaction type, which simply parses a sequence of transaction parameters.

Authorization of access to the server is handled as part of the client record initialization phase.

This library internally uses pointer handles, which we replace with an index in a locally maintained table. It provides a handle to easily detect havoc clients.

A cleaner and simplier interface for distributed processing is available in the module remote.

module mapi;
command listen():int
address SERVERlisten_default comment "Start a Mapi server with the default settings.";
command listen(port:int):int
address SERVERlisten_port comment "Start a Mapi listener on the port given.";
command listen(port:int, maxusers:int):int
address SERVERlisten2 comment "Start a Mapi listener.";
command listen(port:int, maxusers:int, cmd:str):int
address SERVERlisten3 comment "Start the Mapi listener on <port> for <maxusers>. For a new client connection MAL procedure <cmd>(Stream s_in, Stream s_out) is called.If no <cmd> is specified a new client thread is forked.";
command stop():void
address SERVERstop comment "Terminate connection listeners.";
command suspend():void
address SERVERsuspend comment "Suspend accepting connections.";
command resume():void
address SERVERresume comment "Resume connection listeners.";
command malclient(in:streams, out:streams):void
address SERVERclient comment "Start a Mapi client for a particular stream pair.";
command trace(mid:int,flag:int):void
address SERVERtrace comment "Toggle the Mapi library debug tracer.";
pattern reconnect(host:str, port:int, usr:str, passwd:str,lang:str):int
address SERVERreconnectWithoutAlias comment "Re-establish connection with a remote mserver.";
pattern reconnect(host:str, port:int, db_alias:str, usr:str, passwd:str,lang:str):int
address SERVERreconnectAlias comment "Re-establish connection with a remote mserver.";
command reconnect(mid:int):void
address SERVERreconnect comment "Re-establish a connection.";
pattern connect(host:str, port:int, usr:str, passwd:str,lang:str):int
address SERVERconnect comment "Establish connection with a remote mserver.";
command disconnect(dbalias:str):int
address SERVERdisconnectWithAlias comment "Close connection with a remote Mserver.";
command disconnect():int
address SERVERdisconnectALL comment "Close connections with all remote Mserver.";
command setAlias(dbalias:str)
address SERVERsetAlias comment "Give the channel a logical name.";
command lookup(dbalias:str):int
address SERVERlookup comment "Retrieve the connection identifier.";
command disconnect(mid:int):void
address SERVERdisconnect comment "Terminate the session.";
command destroy(mid:int):void
address SERVERdestroy comment "Destroy the handle for an Mserver.";
command ping(mid:int):int
address SERVERping comment "Test availability of an Mserver.";
command query(mid:int, qry:str):int
address SERVERquery comment "Send the query for execution";
command query_handle(mid:int, qry:str):int
address SERVERquery_handle comment "Send the query for execution.";
pattern query_array(mid:int, qry:str, arg:str...):int
address SERVERquery_array comment "Send the query for execution replacing '?' by arguments.";
command prepare(mid:int, qry:str):int
address SERVERprepare comment "Prepare a query for execution.";
command finish(hdl:int):int
address SERVERfinish comment "Remove all remaining answers.";
command get_field_count(hdl:int):int
address SERVERget_field_count comment "Return number of fields.";
command get_row_count(hdl:int):lng
address SERVERget_row_count comment "Return number of rows.";
command rows_affected(hdl:int):lng
address SERVERrows_affected comment "Return number of affected rows.";
command fetch_row(hdl:int):int
address SERVERfetch_row comment "Retrieve the next row for analysis.";
command fetch_all_rows(hdl:int):lng
address SERVERfetch_all_rows comment "Retrieve all rows into the cache.";
command fetch_field(hdl:int,fnr:int):str
address SERVERfetch_field_str comment "Retrieve a single field.";
command fetch_field(hdl:int,fnr:int):int
address SERVERfetch_field_int comment "Retrieve a single int field.";
command fetch_field(hdl:int,fnr:int):lng
address SERVERfetch_field_lng comment "Retrieve a single lng field.";
command fetch_field(hdl:int,fnr:int):sht
address SERVERfetch_field_sht comment "Retrieve a single sht field.";
command fetch_field(hdl:int,fnr:int):void
address SERVERfetch_field_void comment "Retrieve a single void field.";
command fetch_field(hdl:int,fnr:int):oid
address SERVERfetch_field_oid comment "Retrieve a single void field.";
command fetch_field(hdl:int,fnr:int):chr
address SERVERfetch_field_chr comment "Retrieve a single chr field.";
command fetch_field_array(hdl:int):bat[:int,:str]
address SERVERfetch_field_bat comment "Retrieve all fields for a row.";
command fetch_line(hdl:int):str
address SERVERfetch_line comment "Retrieve a complete line.";
command fetch_reset(hdl:int):int
address SERVERfetch_reset comment "Reset the cache read line.";
command next_result(hdl:int):int
address SERVERnext_result comment "Go to next result set.";
command error(mid:int):int
address SERVERerror comment "Check for an error in the communication.";
command getError(mid:int):str
address SERVERgetError comment "Get error message.";
command explain(mid:int):str
address SERVERexplain comment "Turn the error seen into a string.";
pattern put(mid:int, nme:str, val:any_1):void
address SERVERput comment "Send a value to a remote site.";
pattern put(nme:str, val:any_1):str
address SERVERputLocal comment "Prepare sending a value to a remote site.";
pattern rpc(key:int,qry:str...):any
address SERVERmapi_rpc_single_row comment "Send a simple query for execution and fetch result.";
pattern rpc(key:int,qry:str):bat[:any_1,:any_2]
address SERVERmapi_rpc_bat;
command rpc(key:int,qry:str):void
address SERVERquery comment "Send a simple query for execution.";
pattern bind(key:int,rschema:str,rtable:str,rcolumn:str,i:int):bat[:any_1,:any_2]
address SERVERbindBAT comment "Bind a remote variable to a local one.";
pattern bind(key:int,rschema:str,rtable:str,i:int):bat[:any_1,:any_2]
address SERVERbindBAT comment "Bind a remote variable to a local one.";
pattern bind(key:int,remoteName:str):bat[:any_1,:any_2]
address SERVERbindBAT comment "Bind a remote variable to a local one.";

mapi.listen();

MAT

8.16 Multiple association tables

A MAT is a convenient way to deal represent horizontal fragmented tables. It combines the definitions of several, type compatible BATs under a single name. It is produced by the mitosis optimizer and the operations are the target of the mergetable optimizer.

The MAT is materialized when the operations can not deal with the components individually, or the incremental operation is not supported. Normally all mat.new() operations are removed by the mergetable optimizer. In case a mat.new() is retained in the code, then it will behaves as a mat.pack();

The primitives below are chosen to accomodate the SQL front-end to produce reasonable efficient code.

module mat;
pattern new(b:bat[:any_1,:any_2]...):bat[:any_1,:any_2]
address MATpack comment "Define a Merge Association Table (MAT). Faal back to the pack operation when this is called ";
pattern pack(:any_2...):bat[:void,:any_2]
address MATpackValues comment "Materialize the MAT (of values) into a BAT";
pattern pack(b:bat[:any_1,:any_2]...):bat[:any_1,:any_2]
address MATpack comment "Materialize the MAT into a BAT";
pattern pack2(b:bat[:any_1,:any_2]...):bat[:any_1,:any_2]
address MATpack2 comment "Materialize the MAT into a BAT (by an append all)";
pattern print(b:bat[:any_1,:any_2]...):void
address MATprint;
pattern newIterator(grp:bat[:any_1,:any_2]...):bat[:any_1,:any_2]
address MATnewIterator comment "Create an iterator over a MAT";
pattern hasMoreElements(grp:bat[:any_1,:any_2]...):bat[:any_1,:any_2]
address MAThasMoreElements comment "Find the next element in the merge table";
command info(g:str, e:str):bat[:any_1,:any_2]
address MATinfo comment "retrieve the definition from the partition catalogue";

Manual

8.14 Manual Inspection

This module introduces a series of commands that provide access to the help information stored in the runtime environment.

The manual bulk operations ease offline inspection of all function definitions. It includes an XML organized file, because we expect external tools to massage it further for presentation.

module manual;
pattern help(text:str)
address MANUALhelp comment "Produces a list of all <module>.<function> that match the text pattern. The wildcard '*' can be used for <module> and <function>. Using the '(' asks for signature information and using ')' asks for the complete help record.";
pattern search(text:str)
address MANUALsearch comment "Search the manual for command descriptions that match the regular expression 'text'";
pattern createXML(mod:str):void
address MANUALcreate1 comment "Generate a synopsis of a module";
pattern createXML():void
address MANUALcreate0 comment "Produces a XML-formatted manual over all modules loaded.";
pattern section(mod:str):void
address MANUALcreateSection comment "Generate a synopsis of a module for the reference manual";
pattern index():void
address MANUALcreateIndex comment "Produces an overview of all names grouped by module.";
pattern summary():void
address MANUALcreateSummary comment "Produces a manual with help lines grouped by module.";
pattern completion(pat:str):bat[:int,:str]
address MANUALcompletion comment "Produces the wordcompletion table.";

PCRE library

8.19 PCRE library interface

The PCRE library is a set of functions that implement regular expression pattern matching using the same syntax and semantics as Perl, with just a few differences. The current implementation of PCRE (release 4.x) corresponds approximately with Perl 5.8, including support for UTF-8 encoded strings. However, this support has to be explicitly enabled; it is not the default.

Profiler

8.18 Performance profiler

A key issue in developing fast programs using the Monet database back-end requires a keen eye on where performance is lost. Although performance tracking and measurements are highly application dependent, a simple to use tool makes life a lot easier.

Activation of the performance monitor has a global effect, i.e. all concurrent actions on the kernel are traced, but the events are only sent to the client initiated the profiler thread.

8.18.1 Monet Event Logger

The Monet Event Logger generates records of each event of interest indicated by a log filter, i.e. a pattern over module and function names.

The log record contents is derived from counters being (de-)activated. A complete list of recognized counters is shown below.

8.18.2 Execution tracing

Tracing is a special kind of profiling, where the information gathered is not sent to a remote system, but stored in the database itself. Each profile event is given a separate BAT

# thread and time since start
profiler.activate("tick");	
# cpu time in nano-seconds 
profiler.activate("cpu");	
# memory allocation information
profiler.activate("memory");
# IO activity
profiler.activate("io"); 
# Module,function,program counter
profiler.activate("pc"); 
# actual MAL instruction executed
profiler.activate("statement");	

The profiler event can be handled in several ways. The default strategy is to ship the event record immediately over a stream to a performance monitor. An alternative strategy is preparation of off-line performance analysis.

To reduce the interference of performance measurement with the experiments, the user can use an event cache, which is emptied explicitly upon need.

module profiler;
pattern activate(name:str...):void
address CMDactivateProfiler comment "A list of counters to be activated.";
pattern deactivate(name:str...):void
address CMDdeactivateProfiler comment "A list of counters to be deactivated.";
pattern openStream():void
address CMDopenProfilerStream comment "Send the events to output stream";
pattern openStream(fnme:str):void
address CMDsetProfilerFile comment "Send the log events to a file, stdout or console";
pattern openStream(host:str, port:int):void
address CMDsetProfilerStream comment "Send the log events to a stream ";
command closeStream():void
address CMDcloseProfilerStream comment "Stop sending the event records";
pattern setAll():void
address CMDsetAllProfiler comment "Short cut for setFilter(*,*).";
pattern setNone():void
address CMDsetNoneProfiler comment "Short cut for clrFilter(*,*).";
pattern setFilter(mod:str,fcn:str):void
address CMDsetFilterProfiler comment "Generate an event record for all function calls that satisfy the regular expression mod.fcn. A wildcard (*) can be used as name to identify all";
pattern setFilter(v:any):void
address CMDsetFilterVariable comment "Generate an event record for every instruction where v is used.";
pattern clrFilter(mod:str,fcn:str):void
address CMDclrFilterProfiler comment "Clear the performance trace bit of the selected functions.";
pattern clrFilter(v:any):void
address CMDsetFilterVariable comment "Stop tracing the variable" ;
pattern setStartPoint(mod:str,fcn:str):void
address CMDstartPointProfiler comment "Start performance tracing at mod.fcn";
pattern setEndPoint(mod:str,fcn:str)
address CMDendPointProfiler comment "End performance tracing after mod.fcn";
pattern start():void
address CMDstartProfiler comment "Start performance tracing";
command noop():void
address CMDnoopProfiler comment "Fetch any pending performance events";
pattern stop():void
address CMDstopProfiler comment "Stop performance tracing";
command reset():void
address CMDclearTrace comment "Clear the profiler traces";
command dumpTrace():void
address CMDdumpTrace comment "List the events collected";
command getTrace(e:str):bat[:int,:any_1]
address CMDgetTrace comment "Get the trace details of a specific event";
pattern getEvent()(:lng,:lng,:lng)
address CMDgetEvent comment "Retrieve the performance indicators of the previous instruction";
command cleanup():void
address CMDcleanup comment "Remove the temporary tables for profiling";
command getDiskReads():lng
address CMDgetDiskReads comment "Obtain the number of physical reads";
command getDiskWrites():lng
address CMDgetDiskWrites comment "Obtain the number of physical reads";
command getUserTime():lng
address CMDgetUserTime comment "Obtain the user timing information.";
command getSystemTime():lng
address CMDgetSystemTime comment "Obtain the user timing information.";
pattern getFootprint():lng
address CMDgetFootprint comment "Get the memory footprint and reset it";
pattern getMemory():lng
address CMDgetMemory comment "Get the amount of memory claimed and reset it";

Remote

8.20 Remote querying functionality

Communication with other mservers at the MAL level is a delicate task. However, it is indispensable for any distributed functionality. This module provides an abstract way to store and retrieve objects on a remote site. Additionally, functions on a remote site can be executed using objects available in the remote session context. This yields in four primitive functions that form the basis for distribution methods: get, put, register and exec.

The get method simply retrieves a copy of a remote object. Objects can be simple values, strings or BATs. The same holds for the put method, but the other way around. A local object can be stored on a remote site. Upon a successful store, the put method returns the remote identifier for the stored object. With this identifier the object can be addressed, e.g. using the get method to retrieve the object that was stored using put.

The get and put methods are symmetric. Performing a get on an identifier that was returned by put, results in an object with the same value and type as the one that was put. The result of such an operation is equivalent to making an (expensive) copy of the original object.

The register function takes a local MAL function and makes it known at a remote site. It ensures that it does not overload an already known operation remotely, which could create a semantic conflict. Deregister a function is forbidden, because it would allow for taking over the remote site completely. C-implemented functions, such as io.print() cannot be remotely stored. It would require even more complicated (byte?) code shipping and remote compilation to make it work. Currently, the remote procedure may only returns a single value.

The choice to let exec only execute functions was made to avoid problems to decide what should be returned to the caller. With a function it is clear and simple to return that what the function signature prescribes. Any side effect (e.g. io.print calls) may cause havoc in the system, but are currently ignored.

This leads to the final contract of this module. The methods should be used correctly, by obeying their contract. Failing to do so will result in errors and possibly undefined behaviour.

The resolve() function can be used to query Merovingian. It returns one or more databases discovered in its vincinity matching the given pattern.

module remote;

# module loading and unloading funcs

command prelude():void
address RMTprelude comment "Initialise the remote module.";
command epilogue():void
address RMTepilogue comment "Release the resources held by the remote module.";

# global connection resolve function

command resolve(pattern:str):bat[:oid,:str]
address RMTresolve comment "resolve a pattern against Merovingian and return the URIs";

# session local connection instantiation functions

command connect(uri:str, user:str, passwd:str):str
address RMTconnect comment "returns a newly created connection for uri, using user name and password";
command connect(uri:str, user:str, passwd:str, scen:str):str
address RMTconnectScen comment "returns a newly created connection for uri, using user name, password and scenario";
command disconnect(conn:str):void
address RMTdisconnect comment "disconnects the connection pointed to by handle (received from a call to connect()";

# core transfer functions

pattern get(conn:str, ident:str):any
address RMTget comment "retrieves a copy of remote object ident";
pattern put(conn:str, object:any):str
address RMTput comment "copies object to the remote site and returns its identifier";
pattern register(conn:str, mod:str, fcn:str):void
address RMTregister comment "register <mod>.<fcn> at the remote site";
pattern exec(conn:str, mod:str, func:str):str
address RMTexec comment "remotely executes <mod>.<func> and returns the handle to its result";
pattern exec(conn:str, mod:str, func:str)(:str, :str)
address RMTexec comment "remotely executes <mod>.<func> and returns the handle to its result";
pattern exec(conn:str, mod:str, func:str, :str...):str
address RMTexec comment "remotely executes <mod>.<func> using the argument list of remote objects and returns the handle to its result";
pattern exec(conn:str, mod:str, func:str, :str...)(:str, :str)
address RMTexec comment "remotely executes <mod>.<func> using the argument list of remote objects and returns the handle to its result";

Statistics

8.21 Statistics box.

Most optimizers need easy access to key information for proper plan generation. Amongst others, this volatile information consists of the tuple count, size, min- and max-value, the null-density, and a histogram of the value distribution.

The statistics are management by a Box, which gives a controlled environment to manage a collection of BATs and system variables.

BATs have to be deposit into the statistics box separately, because the costs attached maintaining them are high. The consistency of the statistics box is partly the responsibility of the upper layers. There is no automatic triggering when the BATs referenced are heavily modified or are being destroyed. They disappear from the statistics box the first time an invalid access is attempted or during system reboot.

The staleness of the information can be controlled in several ways. The easiest, and most expensive, is to assure that the statistics are updated when you start the server. Alternative, you can set a expiration interval, which will update the information only when it is considered expired. This test will be triggered either at server restart or your explicit call to update the statistics tables. The statistics table is commited each time you change it.

A forced update can be called upon when the front-end expects the situation to be changed drastically.

The statistics table is mostly used internally, but once in a while you need a dump for closed inspection. in your MAL program for inspection. Just use the BBP bind operation to locate them in the buffer pool.

module statistics;
pattern open():void
address STATopen comment "Locate and open the statistics box";
pattern close():void
address STATclose comment "Close the statistics box ";
pattern destroy():void
address STATdestroy comment "Destroy the statistics box";
pattern take(name:any_1):any_2
address STATtake comment "Take a variable out of the statistics box";
pattern deposit(name:str) :void
address STATdepositStr comment "Enter a new BAT into the statistics box";
pattern deposit(name:bat[:any_1,:any_2]) :void
address STATdeposit comment "Enter a new BAT into the statistics box";
pattern releaseAll():void
address STATreleaseAll comment "Release all variables in the box";
pattern release(name:str) :void
address STATreleaseStr comment "Release a single BAT from the box";
pattern release(name:bat[:any_1,:any_2]):void
address STATrelease comment "Release a single BAT from the box";
pattern toString(name:any_1):str
address STATtoString comment "Get the string representation of an element in the box";
pattern discard(name:str) :void
address STATdiscard comment "Release a BAT by name from the box";
pattern discard(name:bat[:any_1,:any_2]) :void
address STATdiscard2 comment "Release a BAT variable from the box";
pattern newIterator()(:lng,:str)
address STATnewIterator comment "Locate next element in the box";
pattern hasMoreElements()(:lng,:str)
address STAThasMoreElements comment "Locate next element in the box";
command update()
address STATupdate comment "Check for stale information";
command forceUpdate()
address STATforceUpdateAll comment "Bring all information up to date";
command forceUpdate(bnme:str)
address STATforceUpdate comment "Bring the statistics up to date for one BAT";
command prelude() :void
address STATprelude comment "Initialize the statistics package";
command epilogue() :void
address STATepilogue comment "Release the resources of the statistics package";
pattern dump() :void
address STATdump comment "Display the statistics table";
command getObjects():bat[:int,:str]
address STATgetObjects comment "Return a table with BAT names managed";
pattern getHotset():bat[:int,:str]
address STATgetHotset comment "Return a table with BAT names that have been touched since the start of the session";
pattern getCount(nme:str):lng
address STATgetCount comment "Return latest stored count information";
pattern getSize(nme:str):lng
address STATgetSize comment "Return latest stored count information";
pattern getMin(nme:str):lng
address STATgetMin comment "Return latest stored minimum information";
pattern getMax(nme:str):lng
address STATgetMax comment "Return latest stored maximum information";
pattern getHistogram(nme:str):bat[:any_1,:any_2]
address STATgetHistogram comment "Return the latest histogram");

Tablet

8.22 The table interface

A database cannot live without ASCII tabular print/dump/load operations. It is needed to produce reasonable listings, to exchange answers with a client, and to keep a database version for backup. This is precisely where the tablet module comes in handy. [This module should replace all other table dump/load functions]

We start with a simple example to illustrate the plain ASCII representation and the features provided. Consider the relational table answer(name:str, age:int, sex:chr, address:str, dob:date) obtained by calling the routine tablet.page(B1,...,Bn) where the Bi represent BATS.

[ "John Doe",		25,	'M',	"Parklane 5",	"25-12-1978" ]
[ "Maril Streep",	23,	'F',	"Church 5",	"12-07-1980" ]
[ "Mr. Smith",		53,	'M',	"Church 1",	"03-01-1950" ]

The lines contain the representation of a list in Monet tuple format. This format has been chosen to ease parsing by any front-end. The scalar values are represented according to their type. For visual display, the columns are aligned by placing enough tabs between columns based on sampling the underlying bat to determine a maximal column width. (Note,actual commas are superfluous).

The arguments to the command can be any sequence of BATs, but which are assumed to be aligned. That is, they all should have the same number of tuples and the j-th tuple tail of Bi is printed along-side the j-th tuple tail of Bi+1.

Printing both columns of a single bat is handled by tablet as a print of two columns. This slight inconvenience is catch-ed by the io.print(b) command, which resolves most back-ward compatibility issues.

In many cases, this output would suffice for communication with a front-end. However, for visual inspection the user should be provided also some meta information derived from the database schema. Likewise, when reading a table this information is needed to prepare a first approximation of the schema namings. This information is produced by the command tablet.header(B1,...,Bn), which lists the column role name. If no role name is give, a default is generated based on the BAT name, e.g. B1_tail.

#------------------------------------------------------#
# name,           age, sex, address,       dob         #
#------------------------------------------------------#
[ "John Doe",      25, 'M', "Parklane 5", "25-12-1978" ]
[ "Maril Streep",  23, 'F', "Church 5",   "12-07-1980" ]
[ "Mr. Smith",     53, 'M', "Church 1",   "03-01-1950" ]

The command tablet.display(B1,...,Bn) is a contraction of tablet.header(); tablet.page().

In many cases, the tablet produced may be too long to consume completely by the front end. In that case, the user needs page size control, much like the more/less utilities under Linux. However, no guarantee is given for arbitrarily going back and forth. [but works as long as we materialize results first ]. A portion of the tablet can be printed by identifying the rows of interest as the first parameter(s) in the page command, e.g.

tablet.page(nil,10,B1,...,Bn);	#prints first 10 rows
tablet.page(10,20,B1,...,Bn);	#prints next 10 rows
tablet.page(100,nil,B1,...,Bn);	#starts printing at tuple 100 until end

A paging system also provides the commands tablet.firstPage(), tablet.nextPage(), tablet.prevPage(), and tablet.lastPage() using a user controlled tablet size tablet.setPagesize(L).

The tablet display operations use a client (thread) specific formatting structure. This structure is initialized using either tablet.setFormat(B1,...,Bn) or tablet.setFormat(S1,...,Sn) (Bi is a BAT, Si a scalar). Subsequently, some additional properties can be set/modified, column width and brackets. After printing/paging the BAT resources should be freed using the command tablet.finish().

Any access outside the page-range leads to removal of the report structure. Subsequent access will generate an error. To illustrate, the following code fragment would be generated by the SQL compiler

	tablet.setFormat(B1,B2);
	tablet.setDelimiters("|","\t","|\n");
	tablet.setName(0, "Name");
	tablet.setNull(0, "?");
	tablet.setWidth(0, 15);
	tablet.setBracket(0, " ", ",");
	tablet.setName(1, "Age");
	tablet.setNull(1, "-");
	tablet.setDecimal(1, 9,2);
	tablet.SQLtitle("Query: select * from tables");
	tablet.page();
	tablet.SQLfooter(count(B1),cpuTicks);

This table is printed with tab separator(s) between elements and the bar (|) to mark begin and end of the string. The column parameters give a new title, a null replacement value, and the preferred column width. Each column value is optionally surrounded by brackets. Note, scale and precision can be applied to integer values only. A negative scale leads to a right adjusted value.

The title and footer operations are SQL specific routines to decorate the output.

Another example involves printing a two column table in XML format. [Alternative, tablet.XMLformat(B1,B2) is a shorthand for the following:]

	tablet.setFormat(B1,B2);
	tablet.setTableBracket("<rowset>","</rowset>");
	tablet.setRowBracket("<row>","</row>");
	tablet.setBracket(0, "<name>", "</name>");
	tablet.setBracket(1, "<age>", "</age>");
	tablet.page();

8.22.1 Tablet properties

More detailed header information can be obtained with the command tablet.setProperties(S), where S is a comma separated list of properties of interest, followed by the tablet.header(). The properties to choose from are: bat, name, type, width, sorted, dense, key, base, min, max, card,....

#--------------------------------------#
# B1,   B2,     B3,     B4,     B5     # BAT
# str,  int,    chr,    str,    date   # type
# true, false,  false,  false,  false  # sorted
# true, true,   false,  false,  false  # key
# ,     23,     'F',    ,              # min
# ,     53,     'M',	,              # max
# 4,     4,     4,      4,      4      # count
# 4,i    3,     2,      2,      3      # card
# name,	age,    sex,   address, dob    # name
#--------------------------------------#

8.22.2 Scalar tablets

In line with the 10-year experience of Monet, printing scalar values follow the tuple layout structure. This means that the header() command is also applicable. For example, the sequence "i:=0.2;v:=sin(i); tablet.display(i,v);" produces the answer:

#----------------#
# i,	v	 #
#----------------#
[ 0.2,	0.198669 ]
#----------------#

All other formatted printing should be done with the printf() operations contained in the module io.

8.22.3 Tablet dump/restore

Dump and restore operations are abstractions over sequence of tablet commands. The command tablet.dump(stream,B1,...,Bn) is a contraction of the sequence tablet.setStream(stream); tablet.setProperties("name,type,dense,sorted,key,min,max"); tablet.header(B1,..,Bn); tablet.page(B1,..,Bn). The result can be read by tablet.load(stream,B1,..,Bn) command. If loading is successful, e.g. no parsing errors occurred, the tuples are appended to the corresponding BATs.

8.22.4 Front-end extension

A general bulk loading of foreign tables, e.g. CSV-files and fixed position records, is not provided. Instead, we extend the list upon need. Currently, the routines tablet.SQLload(stream,delim1,delim2, B1,..,Bn) reads the files using the Oracle(?) storage. The counterpart for dumping is tablet.SQLdump(stream,delim1,delim2);

8.22.5 The commands

The load operation is for bulk loading a table, each column will be loaded into its own bat. The arguments are void-aligned bats describing the input, ie the name of the column, the tuple separator and the type. The nr argument can be -1 (The input (datafile) is read until the end) or a maximum.

The dump operation is for dumping a set of bats, which are aligned. Again with void-aligned arguments, with name (currently not used), tuple separator (the last is the record separator) and bat to be dumped. With the nr argument the dump can be limited (-1 for unlimited).

The output operation is for ordered output. A bat (possibly form the collection) gives the order. For each element in the order bat the values in the bats are searched, if all are found they are output in the datafile, with the given separators.

The scripts from the tablet.mil file are all there too for backward compatibility with the old Mload format files.

The load_format loads the format file, since the old format file was in a table format it can be loaded with the load command.

The result from load_format can be used with load_data to load the data into a set of new bats.

These bats can be made persistent with the make_persistent script or merge with existing bats with the merge_data script.

The dump_format scripts dump a format file for a given set of to be dumped bats. These bats can be dumped with dump_data.

module tablet;
command load( names:bat[:oid,:str], seps:bat[:oid,:str],
types:bat[:oid,:str], datafile:str, nr:int ) :bat[:str,:bat] address CMDtablet_load comment "Load a bat using specific format.";
command input( names:bat[:oid,:str], seps:bat[:oid,:str],
types:bat[:oid,:str], s:streams, nr:int ) :bat[:str,:bat] address CMDtablet_input comment "Load a bat using specific format.";
command dump(names:bat[:oid,:str], seps:bat[:oid,:str],
bats:bat[:oid,:bat], datafile:str, nr:int) :void address CMDtablet_dump comment "Dump the bat in ASCII format";
command output(order:bat[:any_1,:any_2], seps:bat[:oid,:str],
bats:bat[:oid,:bat], s:streams) :void address CMDtablet_output comment "Send the bat to an output stream.";
pattern display(v:any...):int
address TABdisplayRow comment "Display a formatted row";
pattern display(v:bat[:any_1,:any]...):int
address TABdisplayTable comment "Display a formatted table";
pattern page(b:bat[:any_1,:any]...):int
address TABpage comment "Display all pages at once without header";
pattern header(b:any...):int
address TABheader comment "Display the minimal header for the table";
pattern setProperties(prop:str):int
address TABsetProperties comment "Define the set of properties";
pattern dump(s:streams,b:bat[:any,:any]...):int
address TABdump comment "Print all pages with header to a stream";
pattern setFormat(b:any...):void
address TABsetFormat comment "Initialize a new reporting structure.";
pattern finish():void
address TABfinishReport comment "Free the storage space of the report descriptor";
pattern setStream(s:streams):void
address TABsetStream comment "Redirect the output to a stream.";
pattern setPivot(b:bat[:void,:oid]) :void
address TABsetPivot comment "The pivot bat identifies the tuples of interest. The only requirement is that all keys mentioned in the pivot tail exist in all BAT parameters of the print comment. The pivot also provides control over the order in which the tuples are produced.";
pattern setDelimiter(sep:str):void
address TABsetDelimiter comment "Set the column separator.";
pattern setTableBracket(lbrk:str,rbrk:str)
address TABsetTableBracket comment "Format the brackets around a table";
pattern setRowBracket(lbrk:str,rbrk:str)
address TABsetRowBracket comment "Format the brackets around a row";

Set the column properties

pattern setColumn(idx:int, v:any_1)
address TABsetColumn comment "Bind i-th output column to a variable";
pattern setName(idx:int, nme:str)
address TABsetColumnName comment "Set the display name for a given column";
pattern setBracket(idx:int,lbrk:str,rbrk:str)
address TABsetColumnBracket comment "Format the brackets around a field";
pattern setNull(idx:int, fmt:str)
address TABsetColumnNull comment "Set the display format for a null value for a given column";
pattern setWidth(idx:int, maxwidth:int)
address TABsetColumnWidth comment "Set the maximal display witdh for a given column. All values exceeding the length are simple shortened without any notice.";
pattern setPosition(idx:int,f:int,i:int)
address TABsetColumnPosition comment "Set the character position to use for this field when loading according to fixed (punch-card) layout.";
pattern setDecimal(idx:int,s:int,p:int)
address TABsetColumnDecimal comment "Set the scale and precision for numeric values";
pattern setTryAll()
address TABsetTryAll comment "Skip error lines and assemble an error report";
pattern setComplaints(b:bat[:oid,:str]) :void
address TABsetComplaints comment "The comlaints bat identifies all erroneous lines encountered ";
command firstPage():void
address TABfirstPage comment "Produce the first page of output";
command lastPage():void
address TABlastPage comment "Produce the last page of output";
command nextPage():void
address TABnextPage comment "Produce the next page of output";
command prevPage():void
address TABprevPage comment "Produce the prev page of output";
command getPageCnt():void
address TABgetPageCnt comment "Return the size in number of pages";
command getPage(i:int):void
address TABgetPage comment "Produce the i-th page of output";

8.22.6 Raw Load

Front-ends can bypass most of the overhead in loading the BATs by preparing the corresponding files directly and replace those created by e.g. the SQL frontend. This strategy is only advisable for cases where we have very large files >200GB and/or are created by a well debugged code.

To experiment with this approach, the code base responds on negative number of cores by dumping the data directly in BAT storage format into a collections of files on disk. It reports on the actions to be taken to replace BATs. This technique is initially only supported for fixed-sized columns.

Transactions

8.23 Transaction management

In the philosophy of Monet, transaction management overhead should only be paid when necessary. Transaction management is for this purpose implemented as a module. This code base is largely absolute and should be re-considered when serious OLTP is being supported. Note, however, the SQL front-end obeys transaction semantics.

module transaction;
command sync():bit
address TRNglobal_sync comment "Save all persistent BATs";
command commit():bit
address TRNglobal_commit comment "Global commit on all BATs";
command abort():bit
address TRNglobal_abort comment "Global abort on all BATs";
command subcommit(b:bat[:any_1,:str]):bit
address TRNsubcommit comment "commit only a set of BATnames, passed in the tail (to which you must have exclusive access!)";
pattern commit(c:any...)
address TRNtrans_commit comment "Commit changes in certain BATs.";
pattern abort(c:any...)
address TRNtrans_abort comment "Abort changes in certain BATs.";
pattern clean(c:any...)
address TRNtrans_clean comment "Declare a BAT clean without flushing to disk.";
command prev(b:bat[:any_1,:any_2]):bat[:any_1,:any_2]
address TRNtrans_prev comment "The previous stae of this BAT";
command alpha(b:bat[:any_1,:any_2]) :bat[:any_1,:any_2]
address TRNtrans_alpha comment "List insertions since last commit.";
command delta(b:bat[:any_1,:any_2]) :bat[:any_1,:any_2]
address TRNtrans_delta comment "List deletions since last commit.";

Atom modules

Atomary Types

MonetDB supports an extensible type system to accomodate a wide spectrum of database kernels and application needs. The type administration keeps track of their properties and provides access to the underlying implementations.

MAL recognizes the definition of a new type by replacing the module keyword with atom. Atoms definitions require special care, because their definition and properties should be communicated with the kernel library. The commands defined in an atom block are screened as of interest to the library.

MonetDB comes with the hardwired types bit, chr, sht, int, lng, oid, flt, dbl, str and bat, the representation of a bat identifier. The kernel code has been optimized to deal with these types efficiently, i.e. without unnecessary function call overheads.

A small collection of user-defined atom types is shipped with the sysem. They implement types considered essential for end-user applications, such as color, date, daytime, time, timestamp, timezone, blob, and inet, url. They are implemented using the type extension mechanism described below. As such, they provide examples for future extensions. A concrete example is the 'blob' datatype in the MonetDB atom module library(see ../modules/atoms/blob.mx)

The MAL type module overloads the atom structure managed in the GDK library. For the time being, we assume GDK to support at most 127 different atomic types. Type composition is limited to at most two builtin types to form a BAT. Furthermore, the polymorphic type any can be qualified with a type variable index any$I, where I is a digit (1-9). Beware, the TYPE_any is a speudo type known within MAL only.

Defining your own types

For the courageous at heart, you may enter the difficult world of extending the kernel library. The easiest way is to derive the atom modules from one shipped in the source distributed. More involved atomary types require a study of the documentation associated with the atom structures (gdk_atoms), because you have to develop a handful routines complying with the signatures required in the kernel library. They are registered upon loading the atom module.

Kernel modules

The kernel code modules are encapsulated with MAL wrappers. A synopsis of their functionality is described below. The signature details can be found in the appendix.

Aggregation

8.35 Aggregates Module

This module contains some efficient aggregate functions that compute their result in one scan, rather than in the iterative manner of the generic MIL aggregrate implementations.

The implementation code is derived from the original 'aggr' module. It uses a complete type-specific code expansion to avoid any type-checking in the inner-most loops. Where feasible, it replaced (expansive) hash-lookup by significantly cheaper positional void-lookups (if the head-column of the group-extend BAT ("e") is "void") or at least by (also positional) array lookups (in case the group-ids span a reasonably small range);

In addition to the 2-parameter

Alarm

8.36 Timers and Timed Interrupts

This module handles various signaling/timer functionalities. The Monet interface supports two timer commands: alarm and sleep. Their argument is the number of seconds to wait before the timer goes off. The sleep command blocks till the alarm goes off. The alarm command continues directly, executes off a MIL string when it goes off. The parameterless routines time and ctime provide access to the cpu clock.They return an integer and string, respectively.

Algebra

8.37 BAT Algebra

This modules contains the most common algebraic BAT manipulation commands. We call them algebra, because all operations take values as parameters, and produce new result values, but do not modify their parameters. Unlike the previous Monet versions, we reduce the number of functions returning a BAT reference. This was previously needed to simplify recursive bat-expression and manage reference counts. In the current version we return only a BAT identifier when a new bat is being created.

All parameters to the modules are passed by reference. In particular, this means that string values are passed to the module layer as (str *) and we have to de-reference them before entering the gdk library. This calls for knowlegde on the underlying BAT typs`s

BAT

8.39 Binary Association Tables

This module contains the commands and patterns to manage Binary Association Tables (BATs). The relational operations you can execute on BATs have the form of a neat algebra, described in algebra.mx

But a database system needs more that just this algebra, since often it is crucial to do table-updates (this would not be permitted in a strict algebra).

All commands needed for BAT updates, property management, basic I/O, persistence, and storage options can be found in this module.

All parameters to the modules are passed by reference. In particular, this means that string values are passed to the module layer as (str *) and we have to de-reference them before entering the gdk library. (Actual a design error in gdk to differentiate passing int/str) This calls for knowledge on the underlying BAT types`s

Calculator

8.53 Basic arithmetic

This module is an extended version of the V4 arithmetic module. It implements the arithmetic operations on the built-in types, chr, bte, sht, int, flt, dbl and lng. All combinations are implemented. Limited combinations are implemented for bit, oid and str.

[binary operators]
The implemented operators are first of all all comparison that return a TRUE/FALSE value (bit values), i.e. <=, <, ==, !=, >=, and >=.

The module also implements the operators +, -, * and /. The rules for the return types operators is as follows. If one of the input types is a floating point the result will be a floating point. The largest type of the input types is taken.

The max and min functions return the maximum and minimum of the two input parameters.

[unary operators]
This module also implements the unary abs() function, which calculates the absolute value of the given input parameter, as well as the - unary operator.

The inv unary operation calculates the inverse of the input value. An error message is given when the input value is zero.

[bitwise operators]
For integers there are some additional operations. The % operator implements the congruent modulo operation. The << and >> are the left and right bit shift. The or, and, xor and not for integers are implemented as bitwise boolean operations.
[boolean operators]
The or, and, xor and not for the bit atomic type in MIL (this corresponds to what is normally called boolean) are implemented as the logic operations.
[random numbers]
This module also contains the rand and srand functions. The srand() function initializes the random number generator using a seed value. The subsequent calls to rand() are pseudo random numbers (with the same seed the sequence can be repeated).

The general interpretation for the NIL value is "unknown". This semantics mean that any operation that receives at least one NIL value, will produce a NIL value in the output for sure.

The only exception to this rule are the "==" and "!=" equality test routines (it would otherwise become rather difficult to test whether a value is nil).

 

Groups

8.55 The group module

This module contains the primitives to construct, derive, and perform statistical operations on BATs representing groups. The default scheme in Monet is to assume the head to represent the group identifier and the tail an element in the group.

Groups play an important role in datamining, where they are used to construct cross-tables. Such cross tables over a single BAT are already supported by the histogram function. This module provides extensions to support identification of groups in a (multi-)dimensional space.

The module implementation has a long history. The first implementation provided several alternatives to produce/derive the grouping. A more complete (and complex) scheme was derived during its extensive use in the context of the Data Distilleries product. The current implementation is partly a cleanup of this code-base, but also enables provides better access to the intermediate structures produced in the process, i.e. the histogram and the sub-group mapping. They can be used for various optimization schemes at the MAL level.

The prime limitation of the current implementation is that an underlying database of oid->any BATs is assumed. This enables representation of each group using an oid, and the value representation of the group can be accordingly be retrieved easily. An optimized implementation in which we use positional integer id's (as embodied by Monet's void type) is also available.

This limitation on (v)oid-headers is marginal. The primitive GRPsplit produces for any BAT two copies with both a (v)oid header.

8.55.1 Algorithms

There are several approaches to build a cross table. The one chosen here is aimed at incremental construction, such that re-use of intermediates becomes possible. Starting with the first dimension, a BAT is derived to represent the various groups, called a GRP BAT or cross-table BAT.

8.55.2 Cross Table (GRP)

A cross table is an <oid,oid> BAT where the first (head) denotes a tuple in the cross table and the second (tail) marks all identical lists. The tail-oids contain group identifiers; that is, this value is equal iff two tuples belong to the same group. The group identifiers are chosen from the domain of the tuple-identifiers. This simplifies getting back to the original tuples, when talking about a group. If the tuple-oid of 'John' is chosen as a group-id, you might view this as saying that each member of the group is 'like John' with respect to the grouping-criterion.

Successively the subgroups can be identified by modifying the GRP BAT or to derive a new GRP BAT for the subgroups. After all groups have been identified this way, a BAT histogram operation can be used to obtain the counts of each data cube. Other aggregation operations using the MIL set aggregate construct (bat) can be used as well; note for instance that histogram == (b.reverse()).

The Monet interface module specification is shown below. Ideally we should defined stronger type constraints, e.g. command group.new(attr:bat[,:any_1]

The group macro is split along three dimensions:

[type:] Type specific implementation for selecting the right hash function and data size etc.;
[clustered:] The select the appropriate algorithm, i.e., with or without taking advantage of an order of values in the parent groups;
[physical properties:] Values , choosing between a fixed predefined and a custom hashmask. Custom allows the user to determine the size of the hashmask (and indirectly the estimated size of the result). The hashmask is 2^n - 1 where n is given by the user, or 1023 otherwise, and the derived result size is 4 ... 2^n.

Further research should point out whether fitting a simple statistical model (possibly a simple mixture model) can help choose these parameters automatically; the current idea is that the user (which could be a domain-specific extension of the higher-level language) knows the properties of the data, especially for IR in which the standard grouping settings differ significantly from the original datamining application.

Locks

8.56 Lightweight Lock Module

This module provides simple SMP lock and thread functionality as already present in the MonetDB system.

Logger

8.57 The Transaction Logger

In the philosophy of MonetDB, transaction management overhead should only be paid when necessary. Transaction management is for this purpose implemented as a separate module and applications are required to obey the transaction policy, e.g. obtaining/releasing locks.

This module is designed to support efficient logging of the SQL database. Once loaded, the SQL compiler will insert the proper calls at transaction commit to include the changes in the log file.

The logger uses a directory to store its log files. One master log file stores information about the version of the logger and the transaction log files. This file is a simple ascii file with the following format: 6DIGIT-VERSION\n[log file number \n]*]* The transaction log files have a binary format, which stores fixed size logformat headers (flag,nr,bid), where the flag is the type of update logged. The nr field indicates how many changes there were (in case of inserts/deletes). The bid stores the bid identifier.

The key decision to be made by the user is the location of the log file. Ideally, it should be stored in fail-safe environment, or at least the log and databases should be on separate disk columns.

This file system may reside on the same hardware as the database server and therefore the writes are done to the same disk, but could also reside on another system and then the changes are flushed through the network. The logger works under the assumption that it is called to safeguard updates on the database when it has an exclusive lock on the latest version. This lock should be guaranteed by the calling transaction manager first.

Finding the updates applied to a BAT is relatively easy, because each BAT contains a delta structure. On commit these changes are written to the log file and the delta management is reset. Since each commit is written to the same log file, the beginning and end are marked by a log identifier.

A server restart should only (re)process blocks which are completely written to disk. A log replay therefore ends in a commit or abort on the changed bats. Once all logs have been read, the changes to the bats are made persistent, i.e. a bbp sub-commit is done.

Mathematics

.50 BAT math calculator

This module contains the multiplex versions of the linked in mathematical functions.

8.51 The math module

This module contains the math commands. The implementation is very simply, the c math library functions are called. See for documentation the ANSI-C/POSIX manuals of the equaly named functions.

NOTE: the operand itself is being modified, rather than that we produce a new BAT. This to save the expensive copying.

Priority queues

8.59 Priority queues

This module includes functions for accessing and updating a pqueue. A pqueue is an (oid,any) bat. The tail is used as a comparison key. The first element of the pqueue is the smallest one in a min-pqueue or the largest one in a max-pqueue. Each element is larger than (smaller than) or equal to its parent which is defined by (position/2) if position is odd or (position-1)/2 if position is even (positions are from 0 to n-1). The head of the bat is used to keep track of the object-ids which are organized in the heap with respect to their values (tail column).

Status

8.60 System state information

This document introduces a series of bats and operations that provide access to information stored within the Monet Version 5 internal data structures. In all cases, pseudo BAT operation returns a transient BAT that should be garbage collected after being used.

The main performance drain would be to use a pseudo BAT directly to successively access it components. This can be avoided by first assigning the pseudo BAT to a variable.

8.61 Unix standard library calls

The unix module is currently of rather limited size. It should include only those facilities that are UNIX specific, i.e. not portable to other platforms. Similar modules may be defined for Windows platforms

Temporals

8.52 Time/Date multiplexes

[TODO: arithmetic multiplexes] The collection of routines provided here are map operations for the atom time and date primitives.

In line with the batcalc module, we assume that if two bat operands are provided that they are already aligned on the head. Moreover, the head of the BATs are limited to :void, which can be cheaply realized using the GRPsplit operation.

Appendices

The MAL Syntax

The MAL syntax

The MAL syntax is summarized below in extended BNF. Alternative constructs are seperated by | and grouped by parenthesis. Optional parts are marked with square brackets. A repetition is marked with either '+' or '*' to indicate at least once and many times, respectively. Lexical tokens are illustrated in small capitals.

program : (statement ';') *
statement : moduleStmt [helpinfo] | definition [helpinfo]
  | includeStmt | stmt
moduleStmt : MODULE ident | ATOM ident [':'ident]
includeStmt : INCLUDE identifier | INCLUDE string_literal
definition : COMMAND header ADDRESS identifier
  | PATTERN header ADDRESS identifier
  | FUNCTION header statement* END identifier
  | FACTORY header statement* end identifier
helpinfo : COMMENT string_literal
header : [ moduelName '.'] name '(' params ')' result
result : paramType | '(' params ')'
params : binding [',' binding]*
binding : identifier typeName [ props ]
typeName : scalarType | collectionType | ':' any ['_' digit]
scalarType : ':' identifier
collectionType : ':' bat ['[' colType ',' colType ']']
colType : scalarType | anyType
props : '{' property [ ',' property]* '}'
property : identifier [ compare literal]
compare : '=' | '<' | '<=' | '>' | '>=' | '!='
stmt : [tag] varlist [':=' expr ]
tag | RETURN  | BARRIER | CATCH | LEAVE | REDO | RAISE
varlist : variable | '(' variable [',' variable ] * ')'
variable : identifier [ props ]
expr : fcncall | [factor operator] factor
factor : literal_constant | nil | var
fcncall : moduleName '.' name '(' [args] ')'
args : factor [',' factor]*