[Monetdb-developers] Question about milprint_summer

Peter Boncz P.Boncz at cwi.nl
Fri Apr 20 14:19:15 CEST 2007


Hi Maurice,

> For my research on probabilistic XML, I need some additional 
> functions in XQuery implemented in mil. By means of copying 
> code from existing functions in milprint_summer, I've been 
> able to get some things working, but I still do not have the 
> feeling that I understand the meaning and purpose of all the 
> bats. For example, what's this ipik bat? There seem to be two 
> interfaces (NORMAL and VALUES) that pass data from one 
> subexpression to the next. My question is (I guess): which bats 
> are involved in each of the interfaces and which conditions 
> hold and do I need to adhere to wrt these bats? Is there some
> documentation around about this? I checked the pathfinder Wiki, 
> but it, for example, gives no result for a search for "ipik".

Jan wrote the following indications for writing functions in mps:

http://www.pathfinder-xquery.org/wiki/index.php/Function_implementation

The 'ipik' thing is not described there, because it is newer. It stands for
iter-pos-item-kind, the four bats crucial to the relational sequence
representation, which contain BAT[void,T] values.

At some point, Stefan and me introduced support for constants. Since then,
each variable (item-pos-item-kind) can possibly be a MIL constant. However,
one of them *must* be still a BAT. But which? The purpose of the 'ipik' is
to identify that BAT variable. That is, it is assigned to either iter,pos,
item or kind.

Then, the hairy part is whether your MIL code is constant-resistant.
Obviously, if you can operate on the constants directly, the query tends to
be faster. We provide support for constants in many pf_support.mx MIL procs
(e.g. get_container()), as wel as MIL maps (e.g. [+](1,2) just works, i.e.
multiplex allows constant-only parameters). A part of the MIL bat algebra
(join, min, max, texist, etc)  is supported on constants as well, see
MonetDB4/src/modules/plain/constant.mx

If you do not want to bother optimizing your code for constants, you can
always use the 'materialize' command to inflate any constant back into a
bat:

# enforce all variables to be bats (no effort if already so)
iter := iter.materialize(ipik);
pos  :=  pos.materialize(ipik);
item := item.materialize(ipik);
kind := kind.materialize(ipik);

As for the two "rc" modes in mps (VALUES, NORMAL) the following:
- NORMAL should be used always for heterogeneously typed sequences. The
'item'  is a BAT[void,oid], where the tail OID refers to a value container
(e.g. str_values). Booleans and nodes do not have a container, BTW. Thus, we
have str_values/int_values and dbl_values (= dec_values).

- the VALUES representation uses 'item_str', 'item_int' resp 'item_dbl'
variables instead. It can be used for homogeneous sequences (typically,
'kind' is a constant then), and those values directly contain values of the
desired type, e.g. BAT[void,lng] for integers (yes: XQ integers are longs!).


- translate2MIL() can be called with the request VALUES. It then *may*
produce a result in VALUES representation, returning the type in that case.
E.g. for integer-typed results in VALUES representation, it would return
INT. But, even if you request VALUES, translate2MIL may still return NORMAL.
In case translate2MIL is called with rc==NORMAL, it must produce normal.

- if you have the VALUES representation, and want to go to NORMAL, use the
MIL proc addValues(), generated in mps by the convenience function
addValues.

Good luck,

Peter






More information about the developers-list mailing list