[Monetdb-developers] The new "ws-API", Algebra & XRpc (& PFtijah)

Stefan Manegold Stefan.Manegold at cwi.nl
Thu Oct 5 11:03:09 CEST 2006

Dear Peter, dear fellow PF & MXQ developers,

I'm not sure, whether I do understand the new "ws-API" correctly, and would
be grateful if someone of you could enlighten me.

First, here's how far I get so far (please correct me if/where I'm wrong!),
followed by concrete open questions.

For transaction support, Peter introduced a new wsid that uniquely
identifies (a reference to?) a ws, which is required for conflict detection
(as far as I understand).

A wsid is to be generated from/for a ws by calling wsid := ws_id(ws);
however, this generated a new unique id each time is called (two calls on
the same ws yield two different ids, right?), hence, wsid := ws_id(ws)
should immediately follow ws := ws_create() .

For the inverse, there is ws := ws_bat(wsid) that return the ws identified
by wsid; obviously, this function yield the same result each time it's
called with the same wsid.

Since some functionality required to to the wsid instead of "only" the ws,
Peter had changes the signatures/API of some runtime PROCs. As far as I see
right now, these are mainly (only?)
	ws_doc(ws, ... 		->  ws_opendoc(wsid, ...
	ws_addcoll(ws, ... 	->  ws_opencoll(wsid, ...
	ws_destroy(int(ws)) 	->  ws_destroy(wsid)

I finished these changes by simply running the folloing one-liners:

find * -type f | xargs grep -l 'var ws := ws_create();' | xargs perl -i -p -e 's|(var ws := ws_create\(\);)|$1 var wsid := ws_id(ws);|'
find * -type f | xargs grep -l 'ws_doc(ws,'             | xargs perl -i -p -e 's|ws_doc\(ws,|ws_opendoc(wsid,|'
find * -type f | xargs grep -l 'ws_destroy(int(ws))'    | xargs perl -i -p -e 's|ws_destroy\(int\(ws\)\)|ws_destroy(wsid)|'
find * -type f | xargs grep -l 'ws_addcoll(ws,'         | xargs perl -i -p -e 's|ws_addcoll\(ws,|ws_opencoll(wsid,|'
find * -type f | xargs grep -l 'ws_addcoll'             | xargs perl -i -p -e 's|ws_addcoll|ws_opencoll|'

While this seemed to have been enough to fix all .milS tests and (most of)
the milprint_summer functionallity, there are two open issues left:
the Algebra version, XRpc & PFtijah.

To be honest, I haven't looked at PFtijah, yet; some help by the respective
experst would be appreciated.

In cases of XRpc & Algebra, there are still places were wsid := ws_id(ws) is
IMHO called in the wrong location:

In PROC rpc_client() (runtime/xrpc.mx), I had changed
	ws_addcoll(ws, ...
	ws_opencoll(ws_id(ws), ...
While this seems to worg for now with read-only documents (i.e., in the
absense of updates and concurrency), it is IMHO incorrect in general.
Rather, rpc_client should receive wsid as argument instead of ws;
if necessary, ws can then be derived via ws := ws_bat(wsid).

in PROC doc_tbl() (runtime/pf_support.mx), I had changed
	var r := ws_doc(ws, item);
	var wsid := ws_id(ws);
	var r := ws_opendoc(wsid, item);
Same story as above.
However, doc_tbl return ws also in its result BAT-of-BATs, as far as I
understand mainly for "canonical API" reasons.
Obviously, it is not possible to replace BAT ws by lng wsid, here...

Leaving the latter problem aside for the moment, the following could (have)
work(ed) as a general rule, and we should check and change the codebase

1) wsid := ws_id(ws) must only be called immediately after ws := ws_create() 

2) all functions/PROCs that recursively (i.e., including all transitively
   called functions/PROCs) require wsid instead of or inaddition to ws should
   be modified in that the receive wsid instead of ws as an argument;
   ws can then locally be derived via ws := ws_bat(wsid) if/where necessary.

3) all functions/PROCs that recursively (i.e., including all transitively
   called functions/PROCs) "never" (at least not yet) require wsid, but are
   fine with ws can stay unchanged, i.e., getting (only) ws as argument.
   Obviously, wherever these functions need to maintain a signature/API
   aligned with those that fall under point 2) above, we should also change
   these functions as described above with 2).

As indicated above, this does not work for doc_tbl and its companion PROCs
in runtime/pf_support.mx that make-up the interface between the Algebra
version of the compiler and the runtime.
I assume the respective canonical API could be changed in that the
respective PROC get wsid instead of (or in addition to?) ws as arguments.
However, we cannot easily change the API to return wsid (lng) instead of ws
(BAT) in the result BAT-of-BATs. Here, I see three solutions --- the last
one actually comes from JanR:

a) Is ws/wsid indeed required in the result?
   If not, we could simply discard it.

b) Wrap wsid temprarly into a ("fake") [void,lng] BAT containing only one
   (nil,wsid) BUN.

c) (JanR) Put the wsid inside ws --- basically as a ("fake") BAT [void,lng].
   Kind of the "encapsulated" solution --- not only for the algebra-runtime
   Consequently, we should have ws_create() call ws_id(ws) internally and
   stored the result in ws (this could save the seemingly "redundant"
   ws := ws_create(); wsid := ws_id(ws); sequence), and all (most, except at
   least ws_destroy()) signatures/API could be kept / re-unified to pass
   (only) ws (now including wsid) instead of wsid as argument.

   There seems to be one problem, so, if I understand Peter's comment
   - ws-IDs are now lng-s (combination of *unique* ID and bat-ID) such IDs
     are meaningful also after the query is done (and ws-bat deallocated).
                    ^1^1^1^1^1^1^1^1^1^1^1^1^1^1  ^2^2^2^2^2^2^2^2^2^2^2 
     this is needed for trans mgmt.
     All meta-bats witha ws-id field now hold lng instead of int.

   In other words, (^2) wsid needs to be available also if ws is gone.
   I'm not sure, though, whether is has to be available in the global wsid
   variable (i.e., inside/during the query), or (only) in some meta-bats
   outside (and hence after) the query (-context) (as suggested by ^1).


   could you please enlighten me/us, here?

   In case "only" (^1) is required, the "encapsulated" solution seems indeed
   to be suitable for all code.
   Otherwise, is would at least be a local solution for the algebra-runtime
   interface (in case (a) & (b0 are no option), but bears the burden of
   maintaining consistency between the global wsid variable and the wsid stored
   inside the ws.

I hope, this email and any reactions to it help to clearify the current
situation, rather than to blur it even more ...


| Dr. Stefan Manegold | mailto:Stefan.Manegold at cwi.nl |
| CWI,  P.O.Box 94079 | http://www.cwi.nl/~manegold/  |
| 1090 GB Amsterdam   | Tel.: +31 (20) 592-4212       |
| The Netherlands     | Fax : +31 (20) 592-4312       |

