C-Store: A Column-oriented DBMS

From MonetDB
Jump to navigationJump to search

An author: Mike Stonebraker


The paper outlines a complete column-based DBMS.

Most database systems are row-based, which allows efficient writes, which is suited to OLTP (online transaction processing).

Column-stores are read-optimised, storing each attribute of a table in its own separate column. This fact allows easier compression of overall data, by looking at single columns alone.

Overview of the System[edit]

A table is represented in multiple "projections". A projection is group of the columns of a table, sorted on a certain attribute. Multiple projections of a table may exist in order to aid in query optimisation, being divided over nodes in a grid of computers in order for the tables to be available even if a node containing a certain projection goes down.

The system is made from a read-optimised column store (RS), a write-optimised column store (WS) and a "tuple mover" from the latter to the former. When a write is made to the system, the new elements are moved from the WS to the RS. The tuple mover takes care of correctly updating the various projections, and of the sort orders.


A projection of a table T is a projection of that table, possibly with columns from other tables which are connected to T by a foreign key path. A projection's columns are horizontally partitioned into segments, each row in a segment having its own storage key. (s: Segment ID, k: Storage Key).

The union of columns of every projection of a table T must include the columns of T. Projections may be joined with join indexes.

Join Index
Projections T1 and T2 may be joined by common attributes. Defined by a table, for each segment M of T1 and each row R of M, of tuples of the form (s, k) where s is a segment of T2 and k is a storage key in that segment. A join index may be viewed as a re-ordering of the segments of T1.

The system aims to find the segmentation and projection of the data that optimises performance and storage safety over the nodes which it controls. This is done based on a typical useage plan given by the DBA.

The components of the System[edit]


Columns are compressed based on how they are sorted, and the number of distinct values in columns.


Columns are stored as in RS, but also with explicit storage keys. Uncompressed because writing needs to be efficient.