The MonetDB team at CWI/MonetDB BV is pleased to announce the
Apr2011-SP2 bug fix release of the MonetDB suite of programs.
This is the first release with support for Fedora 15. Fedora 13 is no
longer supported.
One important bug fix has to do with columns with string (char/varchar)
values. There was a bug that could cause exponential growth of the file
holding the data for the column. This bug has been fixed. If you have
a database that is much larger than you would expect for the amount of
data, this might be the cause. The remedy is to dump and restore that
database (or just the affected tables). Apart from using up too much
space, this bug didn't have any other consequences.
More information (including release notes) on this release is available
at <http://www.monetdb.org/OldReleaseNotes/Apr2011>.
As usual, the download location is <http://dev.monetdb.org/downloads/>.
--
Sjoerd Mullender
I have created a new stable branch for the next feature release. This
branch was created in order to stabilize the code before the feature
release.
>From now on *only* bug fixes may go to this new stable branch. I
reserve the right to undo any changes that do not comply with this
rule. I also reserve the right to admit changes that are not strictly
bug fixes.
The name for the branch is Aug2011. So in each of your stable
checkout directories you can do
hg pull
hg update -rAug2011
to update to this new branch.
--
Sjoerd Mullender
Hi Tim,
[I prefer to keep the mailing list involved to share the discussion;
I hope you don't mind ... ;-) ]
On Mon, Jul 04, 2011 at 07:30:36PM +0200, Tim Speier wrote:
> Hello Stefan,
>
> thanks for your fast reply.
>
> I am especially interested in tuple reconstruction mechanisms in
> column-oriented databases. So I'm not a user.
I see;
please also consider reading our papers (cf.,
http://repository.cwi.nl/search/searchrepository.php?stheme=INS1),
in particular
http://repository.cwi.nl/search/fullrecord.php?publnr=14290http://repository.cwi.nl/search/fullrecord.php?publnr=11117
and/or these PhD these:
http://repository.cwi.nl/search/fullrecord.php?publnr=14832http://repository.cwi.nl/search/fullrecord.php?publnr=14301http://repository.cwi.nl/search/fullrecord.php?publnr=16706
Basically, "even" in main memory, sequential access is fast, while random
access is slow.
>
> > Further, for any set of BATs the represent the individual columns of a
> > relational table, the BATs' "heads" (first/left column) are aligned and
> > consist of OIDs that form a dense (i.e., increment == 1) ascending sequence
> > of integer numbers starting at a given offset (usually 0). Thus, these OIDs
> > can indeed be seen as tuple-/record- identifiers. Be aware, though, that
> > these OIDs are only used internally and not available as tuple-/record-
> > identifiers in SQL.
> > Given their denseness properly, these OIDs are usually not materialized
> > (read: physically stored) but rather computed on the fly, and they
> > effectively indicate (module offset) the physical position of the respective
> > record's attribute values in all BATs involved.
>
> I am sorry, but it is not yet clear to me. Let me create an example:
> A relation R with two attributes X and Y. It consists of the following
> tuples:
>
> +---+-----+
> | X | Y |
> +---+-----+
> | A | 300 |
> +---+-----+
> | B | 100 |
> +---+-----+
> | C | 400 |
> +---+-----+
>
> Now let's create two BATs: each record gets its own OID and, my naive
> approach, map each attribute values to a BAT:
>
> BAT X BAT Y
> +---+---+ +---+-----+
> |OID|VAL| |OID|VAL |
> +---+---+ +---+-----+
> | 0 | A | | 0 | 300 |
> +---+---+ +---+-----+
> | 1 | B | | 1 | 100 |
> +---+---+ +---+-----+
> | 2 | C | | 2 | 400 |
> +---+---+ +---+-----+
>
> Now I could reconstruct each tuple by its OID and the index. But I would
> loose some advantages of the column-oriented databases.
Like which?
I guess you mean having each column sorted individually?
> A more sophisticated approach would be to store values in ascending
> order like this:
>
> BAT X BAT Y
> +---+---+ +---+-----+
> |OID|VAL| |OID|VAL |
> +---+---+ +---+-----+
> | 0 | A | | 1 | 100 |
> +---+---+ +---+-----+
> | 1 | B | | 0 | 300 |
> +---+---+ +---+-----+
> | 2 | C | | 2 | 400 |
> +---+---+ +---+-----+
>
> But how can I now reconstruct the tuples efficient now?
> Is there an additional index structure (as you said in
> your foregoing email). ?
In the first case, tuple reconstruction is cheap (almost "for free"), as
columns are physically aligned, and hence, highly efficient in-order
positional lookup can be exploited for tuple reconstruction.
Obviously, with keeping the columns physically aligned does not allow to
order them individually according to their values. Thus without further
indeces, value-lookups using binary search are only possible for on sorted
column; for all other columns, some secondary index structure would be
required. MonetDB only used (automatically) hash-indeces for single-value
(point-)lookups, but resorts to (highly optimized) sequential scans for any
range lookups on non-sorted columns.
In the second case, value-lookups can be done (reasonably) efficient using
binary search on each column, however tuple-reconstruction (i.e., OID
lookups) can no longer be done with simple highly efficient in-order
positional lookups. In that case, MonetDB would (automatically on-the-fly)
build a hash-index on the non-ordered (non-dense) OID columns.
In general, there is no such thing as a "free lunch" ... ;-)
Hope this helps you further --- don't hesitate to keep asking in case you
have more questions ...
Gruß,
Stefan
>
> I'm very sorry for my innocent questions.
> Tim
--
| Stefan.Manegold @ CWI.nl | DB Architectures (INS1) |
| http://CWI.nl/~manegold/ | Science Park 123 (L321) |
| Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
Dear all,
I'm trying to understand the tuple reconstruction algorithm in MonetDB.
Maybe someone can give me a short overview to understand it or the
place in documentation where I can do some research.
As far as I understood, the BAT consists of object-ids and values. Is
the object-identifier the same as the record- or tuple-identifier?
If it so, is there a special mechanism to position a specific oid
in a BAT or must it be scanned to find a oid?
And if not, how can I determine the corresponding values in different
BATs to reconstruct a tuple?
Or am I totally wrong and there is an obvious way to reconstruct
tuples from different BATs?
Thanks in advance,
Tim