Skip to main content

Data Partitioning

The prime method to enable finer control of locality of data access during query evaluation is to horizontally partition the tables in a database. This is made possibly by introducing the concept of MERGE TABLE, which allows a virtual table to be defined as the union of its partitions.

Creating MERGE TABLE-s

The SQL statements below show how to create a MERGE TABLE mt containing two partitions t1 and t2. Then the partition tables can be queried both individually and jointly.

    -- Create partition tables with identical column types, and add some values.
    -- The column names are not significant.
    CREATE TABLE t1 (i int);
    CREATE TABLE t2 (j int);
    INSERT INTO t1 VALUES (11), (13);

    INSERT INTO t2 VALUES (23), (27);

    -- Create a MERGE TABLE
    CREATE MERGE TABLE mt1 (t int);

    -- Add the partition tables into the MERGE TABLE
    ALTER TABLE mt1 ADD TABLE t1;
    ALTER TABLE mt1 ADD TABLE t2;

    sql> -- sanity check
    sql> SELECT count(*) FROM t1;
    +------+
    | L1   |
    +======+
    |    2 |
    +------+
    1 tuple (2.231ms)
    sql> SELECT count(*) FROM t2;
    +------+
    | L1   |
    +======+
    |    2 |

    +------+
    1 tuple (2.226ms)
    sql> SELECT count(*) FROM mt1;
    +------+
    | L1   |
    +======+
    |    4 |
    +------+
    1 tuple (3.455ms)

Two MERGE TABLE can contain overlapping partition tables. A MERGE TABLE can also contain other MERGE TABLE-s:

    CREATE TABLE t3 (k int);
    INSERT INTO t3 VALUES (31), (37);

    CREATE TABLE t4 (l int);
    INSERT INTO t4 VALUES (41), (47);

    -- An overlapping MERGE TABLE (with mt1)
    CREATE MERGE TABLE mt2 (t int);

    ALTER TABLE mt2 ADD TABLE t1;
    ALTER TABLE mt2 ADD TABLE t3;

    -- A MERGE TABLE of MERGE TABLE
    CREATE MERGE TABLE mt3 (t int);
    ALTER TABLE mt3 ADD TABLE mt1;

    ALTER TABLE mt3 ADD TABLE t4;

    sql> -- sanity check
    sql> SELECT * FROM mt2;

    +------+
    | t    |
    +======+
    |   11 |
    |   13 |
    |   31 |
    |   37 |
    +------+
    4 tuples (2.677ms)
    sql> SELECT * FROM mt3;
    +------+
    | t    |
    +======+
    |   11 |
    |   13 |
    |   23 |
    |   27 |
    |   41 |
    |   47 |
    +------+
    6 tuples (2.496ms)

Updating MERGE TABLE-s

As of release Apr2019 (11.33.3) we support updatable merge tables, see updatable-merge-tables for details.

Previously, a MERGE TABLE could not be directly updated. All updates had to be done on the individual partition tables:

    sql> -- All updates must go directly into the individual partition table
    sql> INSERT INTO mt1 VALUES(1);
    INSERT INTO: cannot insert into merge table 'mt1'
    sql> UPDATE mt1 SET t=10 where t=11;
    UPDATE: cannot update merge table 'mt1'
    sql> UPDATE t1 SET i=10 where i=11;
    1 affected row (4.081ms)

One can remove a partition table from a MERGE TABLE using an ALTER TABLE statement. This also enables dropping a partition table, while keep the merge table.

    -- Remove a partition table from a MERGE TABLE
    ALTER TABLE mt1 DROP TABLE t1;

    sql> -- sanity check
    sql> SELECT count(*) FROM mt1;
    +------+
    | L1   |
    +======+
    |    2 |
    +------+
    1 tuple (1.455ms)

When dropping a partition table, all MERGE TABLE-s that depend on it must be dropped first:

    sql> -- Drop table must happen in the correct order of dependency
    sql> DROP TABLE t2;
    DROP TABLE: unable to drop table t2 (there are database objects which depend on it)
    sql> DROP TABLE mt1;
    DROP TABLE: unable to drop table mt1 (there are database objects which depend on it)
    sql> DROP TABLE mt3;
    operation successful (3.048ms)
    sql> DROP TABLE mt1;
    operation successful (1.948ms)
    sql> DROP TABLE t2;
    operation successful (2.329ms)

Improving performance of search queries on MERGE TABLE-s

When performing for example a simple SELECT ... WHERE query on a merge table, logically the query has to be forwarded to each partition table. However based on the predicates in the WHERE clause it is possible for the query optimizer to decide which partition tables have to be scanned or can be skipped when the optimizer has knowledge of certain statistical properties of the partition tables of a MERGE TABLE. To prepare these statistics, the user has to run the ANALYZE command over a MERGE TABLE.