Query processing of pre-partitioned data using Sandwich Operators

From MonetDB
Jump to navigationJump to search

One Author: Stephan Baumann


Use pre-ordering and pre-grouping to be able to sandwich operators between two new operators, PartitionRestart and PartitionSplit, which reduce the size of data given to the original operators so that space and time complexity are reduced.

The previously optimised operators are left unchanged.


Physical Relation
An ordering on a relation R of tuples t_i.
Order Property
An ordering on a relation R defined on a subset {A_1,...,A_n} of the attributes of R. It is given by A_1^\alpha_1 \rarrow ... \rarrow A_n^\alpha_n. \alpha_i \in {O,G}, defining either an ordering or a grouping.
Group Identifier
An extra attribute _groupID_ can be added to these tables, based on the above order properties.

Usage by the Relational Operators[edit]

Let R be a relation, and P an ordering property on that relation.

GROUP BY[edit]

Given a relation R and an ordering property P and a set of attributes A by which to group the table, find the longest prefix of P that forms a subset of A. Given that the relation R is ordered on the property P, the table may be hierarchically split on P, flushing the hash table on the remaining attributes of the group by on the completion of every group from P. This reduces memory and CPU consumption.

The operation may be split in this way, because no two groups in the result may have come from a single group defined by P.


Suppose a prefix Q of the sort keys, which is a fully ordered ordering property on R, is also a prefix of P. Then the sort operation may work group-wise on the groups defined by that prefix, ordering each group by the remainder of the sort keys.

Hash Join[edit]

Let S be another relation, stored with ordering property Q. Suppose that R is to be joined to S using the join keys K. Let K_s be the largest subset of K with the property that an ordering property made of the elements of that subset is a prefix of both P and Q. A group-wise merge may then be performed in the joining of R and S.

The Sandwich Operators[edit]

The operators are PartitionSplit and PartitionRestart. The idea is that a given relational operator is sandwiched between PartitionSplit and PartitionRestart. For unary operators, PartitionSplit goes between the input of the operator and the operator itself. PartitionRestart comes between the output of the operator and the operator of which the sandwiched operator is the input. Each sandwich operator as a .Next() method, which sends a vector, of length n, of tuples to the parent. The .Next() method will send vectors of tuples until a group boundary is seen, at which point it well send only the remaining tuples in the group and signal an end of stream.


PartitionSplit is responsible for splitting the input of an operator by groupIDs. The granularity over which to split is determined by the context. PartitionSplit will send the tuples of the current group until a group boundary is observed, at which point it will signal an end of stream.


PartitionRestart will forward the output tuples of the operator in vectors of length n until an end of stream is received, which will have originated from the PartitionSplit. It forwards this end of stream to the parent. On the next call of .Next() to PartitionRestart, the corresponding PartitionSplit will be restarted, and so will the sandwiched operator. The output of the child is then forwarded again unless the stream really has ended.

Increase in Speed[edit]


The complexity, normally O(N * Log(N)), decreases to O(\gamma * ((N/\gamma) * Log(N/\gamma)), where \gamma is the number of groups. The idea, however, of splitting into groups is to reduce the amount of data that needs to be held in the memory hierarchy at a given time. The smaller the data being processed, the more that cache levels closer to the CPU can be exploited. This in turn greatly reduces computation time. Therefore if we have a cost model that takes the memory hierarchy into account, we should apply this to the reduced data size (N/\gamma), and multiply this \gamma times.

The hope, for instance, for joins is that groups are just small enough so that when two of them are joined, the hash tables fit into the L1 cache.