Type system

Type system mk Tue, 03/30/2010 - 11:58

MonetDB supports an extensible type system to accomodate a wide spectrum of database kernels and application needs. The type administration keeps track of their properties and provides access to the underlying implementations. MonetDB comes with the hardwired scalar types bit, bte, sht, int, lng,  hge, oid, flt, dbl and str. The kernel code has been optimized to deal with these types efficiently, i.e. without unnecessary function call overheads.

A small collection of user-defined atom types is shipped with the sysem. They implement types considered essential for end-user applications, such as color, date, daytime, time, timestamp, timezone, blob, and web supportive types inet, url and json. They are implemented using the type extension mechanism described below. As such, they provide examples for future extensions. A concrete example is the 'blob' datatype in the MonetDB atom module library(see ../modules/atoms/blob.mx)

Type resolutions

MAL is mostly a strongly typed language. Given the interpretative nature of many of the MAL instructions, when and where type resolution takes place is a critical design issue. Performing it too late, i.e. at each instruction call, leads to performance overhead if we derive the same information over and over again. However, many built-in operators have polymorphic typed signatures, so we cannot escape it altogether. Consider the small illustrative MAL program:

     function sample(nme:str, val:any_1):bit;
        c := 2 * 3;
        b := bbp.bind(nme);  #find a BAT
        h := algebra.select(b,val,val);
        t := aggr.count(h);
        x := io.print(t);
        y := io.print(val);
     end sample;

The function definition is polymorphic typed on the 2nd argument, it becomes a concrete type upon invocation. The system could attempt a type check, but quickly runs into assumptions that generally do not hold. The first assignment can be type checked during parsing and a symbolic optimizer could even evaluate the expression once. Looking up a BAT in the buffer pool leads to an element :bat[:oid,tt] where tt is a runtime dependent type, which means that the selection operation can not be type-checked immediately. It is an example of an embedded polypmorphic statement, which requires intervention of the user/optimizer to make the type explicit before the type resolver becomes active. The operation count can be checked, if it is given a BAT argument. This assumes that we can infer that 'h' is indeed a BAT, which requires assurance that algebra.select produces one. However, there are no rules to avoid addition of new operators, or to differentiate among different implementations based on the argument types. Since print(t) contains an undetermined typed argument we should postpone type checking as well. The last print statement can be checked upon function invocation.

Life becomes really complex if the body contains a loop with variable types. For then we also have to keep track of the original state of the function. Or alternatively, type checking should consider the runtime stack rather than the function definition itself.

These examples give little room to achieve our prime objective, i.e. a fast and early type resolution scheme. Any non-polymorphic function can be type checked and marked type-safe upon completion. Type checking polymorphic functions are post-poned until a concrete type instance is known. It leads to a clone, which can be type checked and is entered into the symbol table.

Defining your own types

For the courageous at heart, you may enter the difficult world of extending the code. The easiest way is to derive the atom modules from one shipped in the source distributed. More involved atomary types require a study of the atom structures (gdk_atoms), because you have to develop a handful routines complying with the signatures required in the kernel library. They are registered upon loading the atom module.