[MonetDB-users] VOIDs and multiple BATs

Stefan Manegold Stefan.Manegold at cwi.nl
Mon May 9 10:04:43 CEST 2005


Hi Ed,

... finally, I'm back ...

In case you store a relational tuple (say <a1,a2,a3>) in VOID-headed BATs
(say A2:BAT[VOID,any], A2:BAT[VOID,any], A3:BAT[VOID,any]) you don't have to
worry/care about syncedness at all --- as far as maintaining the information
that BATs are synced and the resulting internal optimizations are concerned.
MonetDB takes care of it as far as necessary/possible: seqbase and number of
BUNs/tuples (count()) as sufficient infomation to derive syncedness. As
mentioned in my last mail, you/the front-end are "responsible" for that
appends/updates are done consistently.

The same holds for OID columns that form a dense, ascending sequence of
OIDs. Internally, MonetDB treats these just like VOIDs (using the first
(i.e., smallest) OID as seqbase).

Appart from these cases, syncedness can (and does) only play a role with
derived BATs. Suppose there is a [OID,int] BAT B and you derive a BAT C with
all tail values incremented by 1 ("C := B [+] 1"), then MonetDB recognized
that both BATs B & C will have the same OID sequence in the head, and hence
mark them synced. One you update either or both BATs the syncned gets
distroyed: There is no logic in MonetDB that would recognize that, e.g., 
"B.append(123 at 0,123); C.append(123 at 0,124);" does actually maintain the
syncedness.

> From the bat documentation:
> 
> "For BAT-sets that contain exactly the same sets of OIDs, those
> OID-columns can be marked synced(). This information is then used to use
> positional lookup (the i-th element in BAT1 corresponds with the i-th in
> BAT2) and to turn semijoin- into very cheap copy-operations, etc."
> 
> almost makes it sound as if you need synched() to actual test and set this
> bit. Is this true? How is the "syncedness" ensured when I have a BAT[OID,
> any], or determined?

As mentioned earlier, the syncedness is only used internally. There is no
way to set it explicitely from MIL.
For VOID columns and dense, ascending OID columns, syncedness is detected
only from their seqbase and #BUNs (see above and below).

> From the bat documentation:
> 
> "This is implemented by storing a very large OID for each column. An
> update to the column destroys this OID."
> 
> From your explanation, VOIDs are handled by seqbase and # of buns. For OID
> columns in a BAT (ie, BAT[OID, any]), is this just the first BUN?

Note, the "very large OID" that is used to identify synced BATs is NOT the
same as their seqbase(s). In case of VOID or dense, ascending OID columns,
syncedness is given, if two columsn have the same seqbase & same length
(#BUNs); in this case, the "very large OID" is not needed/used at all.
In other cases (see e.g., the "[+]" example above), MonetDB assignes both
BATs the same "very large OID" to record there syncedness. Thus with later
operations that use two BATs that have the same "very large OID", MonetDB
knows (assumes) that these BATs are synced.

[For dense, ascending OID columns, the seqbase is in deed the value of the
 first BUN, i.e., the smallest value in that column (see above).
 For all other OID columns, there obviously is no seqbase.]

> For update operations, does this mean that in the middle of inserting into
> a set of BATs, that as the BUN count is different, a concurrent query may
> not realize that the two BATs are synced? (this gets back to multiple
> client isolation for changes)

Since there is no way to detect "syncedness maintaining updates" in MonetDB,
explicitely marked syncedness has to be "un-marked" (by "destroying" the
"very large OID").

In general, in cases like your's, i.e., updates of a set of BATs that represent
the attributes of a wide (relational) table, synced can only be maintained
(and hence exploited) by using head columsn that are either VOID or at least
dense, ascending OIDs. (If your scenario is append-only, VOID columns are
"perfect"!)

Syncedness as described in the documentation with the "very large OID"
is only applicable for derived BATs (as described above) with the [V]oid
columns no being modified.

As a summary, in your case, you don't have to worry abotu syncedness! 
If you can/do use VOID-headed BATs (or at least OID-headed BATs with the
OIDs forming dense, ascending sequences), you get syncedness "for free".
Otherwise, there is no way to maintain/use syncedness at all.

I hope, this helps you.

Please let us know, if you have more questions, problems, comments, etc.
concerning MonetDB!

Regards,

Stefan

-- 
| Dr. Stefan Manegold | mailto:Stefan.Manegold at cwi.nl |
| CWI,  P.O.Box 94079 | http://www.cwi.nl/~manegold/  |
| 1090 GB Amsterdam   | Tel.: +31 (20) 592-4212       |
| The Netherlands     | Fax : +31 (20) 592-4312       |




More information about the users-list mailing list