Skip to main content

SQL:1999 grouping combinations now supported in MonetDB

As of MonetDB Jun2020 release, we have added support for ROLLUP, CUBE and GROUPING SETS features defined by the SQL:1999 standard in the GROUP BY clause. These keywords allow one to perform multiple grouping set combinations in a single statement, containing partial and total computed results. At the same time, the GROUPING aggregate, which identifies the grouping columns used in the output, has also been implemented.

The grammar is as follows:

group_by_clause:

    GROUP BY group_by_element [',' group_by_element ... ]

group_by_element:

    identifier

  | ROLLUP '(' identifier [',' identifier ... ] ')'

  | CUBE '(' identifier [',' identifier ... ] ')'

  | GROUPING SETS '(' group_by_element [',' group_by_element ... ] ')'

  | '(' ')'

The expression ‘(’ ‘)’ means a grouping without columns, i.e. a global aggregate.

In practice, whenever ROLLUP, CUBE and GROUPING SETS are used inside the GROUP BY clause, the incoming data is split for the generated grouping combinations, then each grouping is computed, then the results are returned. This can be seen as multiple GROUP BY clauses being generated, computed, then merged with UNION statements. Depending on the grouping keyword used, the generated grouping combination is different.

  • ROLLUP represents a hierarchy of columns. For n grouping columns, ROLLUP generates n+1 groupings. As an example, ROLLUP (a,b,c) translates into four grouping combinations: (a,b,c), (a,b), (a) and ().
  • CUBE represents all possible combinations of values in the supplemented columns. For n grouping columns, CUBE generates 2^n groupings. As an example, CUBE (a,b,c) translates into eight grouping combinations: (a,b,c), (a,b), (a,c), (b,c), (a), (b), (c) and ().
  • GROUPING SETS computes the list of grouping combinations supplied as they are with no changes. For n grouping columns, GROUPING SETS generates n groupings. As an example, GROUPING SETS (a,b,c) translates into three grouping combinations: (a), (b) and (c).

Nested and mixed groupings:

  • GROUPING SETS allows nesting of combinations, however, they are merged altogether. As an example, GROUPING SETS (a, GROUPING SETS (b), ROLLUP (c, d)), translates into (a), (b), (c, d), (c), ().
  • The elements of ROLLUP and CUBE may contain lists of columns, in which they are treated as a single unit. As an example, ROLLUP (a,b,c,d), which translates into (a,b,c,d), (a,b,c), (a,b), (a) and (), is different from ROLLUP ((a,b),(c,d)), which translates into (a,b,c,d), (a,b) and ().
  • If multiple grouping elements are given in the GROUP BY clause, the output combinations are the cross product of every generated sublist altogether. As an example, GROUP BY ROLLUP(a,b), c, CUBE(d,e) means ((a,b),(a),()) x (c) x ((d,e),(d),(e),()), which translates into (a,b,c,d,e), (a,b,c,d), (a,b,c,e), (a,b,c), (a,c,d,e), (a,c,d), (a,c,e), (c,d,e), (c,d), (c,e), (c).

The GROUPING aggregate has also been added to identify the grouping columns used for each row computed in the output. This aggregate takes as input identifiers for the group columns used in the GROUP BY clause. The output is bitmask indicating which of the arguments were not used in the output grouping set. The last grouping element corresponds to the least-significant bit.

Demonstration

Below, we list example queries with the added functionalities.

CREATE TABLE categories (id INTEGER, description TEXT);
CREATE TABLE sections (id INTEGER, description TEXT);
CREATE TABLE products (id INTEGER, categoryid int, sectionid INTEGER, description TEXT, price DECIMAL(6,2));
CREATE TABLE sales (productid INTEGER, sale_day DATE, units INTEGER);

INSERT INTO categories VALUES (1, 'fresh food'), (2, 'dry food'), (3, 'drinks');
INSERT INTO sections VALUES (1, 'front'), (2, 'side'), (3, 'back');
INSERT INTO products VALUES (1, 1, 1, 'apples', 1.5), (2, 1, 2, 'melons', 4.0), (3, 2, 2, 'peanuts', 2.0), (4, 3, 1, 'water', 1.0), (5, 3, 3, 'wine', 5.0), (6, 2, 3, 'walnuts', 1.5);
INSERT INTO sales VALUES (1, date '2020-03-01', 10), (2, date '2020-03-01', 3), (4, date '2020-03-01', 4), (1, date '2020-03-02', 6), (4, date '2020-03-02', 5), (5, date '2020-03-02', 2), (1, date '2020-03-03', 7), (3, date '2020-03-03', 4), (2, date '2020-03-03', 3), (5, date '2020-03-03', 1), (6, date '2020-03-03', 1);

-- In the first example we compute the total sales per product per day, then the total sales per product, and finally the overall total using the ROLLUP construct. 
-- If a column was not used for the aggregate, it gets a NULL value by default. In our query, we use COALESCE to give them a meaningful name.
SELECT COALESCE(products.description, 'all_products') AS description,
       COALESCE(sale_day, 'all_days') AS sale_day,
       totals.total
FROM (
    SELECT productid, sale_day, SUM(units * price) AS total 
    FROM products
    LEFT JOIN sales ON sales.productid = products.id
    GROUP BY ROLLUP(productid, sale_day)
) AS totals
LEFT JOIN products ON products.id = totals.productid
ORDER BY sale_day NULLS LAST, productid NULLS LAST;

description sale_day total
apples 2020-03-01 15.0000
melons 2020-03-01 12.0000
water 2020-03-01 4.0000
apples 2020-03-02 9.0000
water 2020-03-02 5.0000
wine 2020-03-02 10.0000
apples 2020-03-03 10.5000
melons 2020-03-03 12.0000
peanuts 2020-03-03 8.0000
wine 2020-03-03 5.0000
walnuts 2020-03-03 1.0000
apples all_days 34.5000
melons all_days 24.0000
peanuts all_days 8.0000
water all_days 9.0000
wine all_days 15.0000
walnuts all_days 1.5000
all_products all_days 92.0000

-- In the following query, we compute the total sales by category, section and global. The left joins in the outer query are used to get the description.
SELECT COALESCE(sections.description, 'all_sections') AS section,
       COALESCE(categories.description, 'all_categories') AS category,
       totals.total 
FROM (
    SELECT categoryid, sectionid, SUM(units * price) AS total 
    FROM products
    LEFT JOIN sales ON sales.productid = products.id
    GROUP BY GROUPING SETS ((categoryid), (sectionid), ())
) AS totals
LEFT JOIN categories ON totals.categoryid = categories.id
LEFT JOIN sections ON totals.sectionid = sections.id;

description sale_day total
front all_categories 43.5000
side all_categories 32.0000
back all_categories 16.5000
all_sections fresh food 58.5000
all_sections drinks 24.0000
all_sections dry food 9.5000
all_sections all_categories 92.0000

--We can use the GROUPING aggregate to identify which of the columns were used in the grouping, then filter the result based on this information. In the next example we create a view with the computed results of the previous example and then we filter the category exclusive aggregates.
CREATE VIEW sales_totals AS
    SELECT GROUPING(categoryid) AS category_aggregates, 
           GROUPING(sectionid) AS section_aggregates, 
           categoryid, sectionid, SUM(units * price) AS total 
    FROM products
    LEFT JOIN sales ON sales.productid = products.id
    GROUP BY GROUPING SETS ((categoryid), (sectionid), ());

SELECT description, total
FROM sales_totals
LEFT JOIN categories ON sales_totals.categoryid = categories.id
WHERE category_aggregates = 0;

description total
fresh food 58.5000
drinks 24.0000
dry food 9.5000

Implementation notes

Some implementation notes are left here:

  • Only identifiers are allowed inside ROLLUP, CUBE and GROUPING SETS clauses. Some other databases allow expressions as a non SQL standard extension.
  • The SQL type of the result of the GROUPING aggregate varies with the number of grouping columns. Up to 7 columns, it is a TINYINT; between 8 and 15 columns a SMALLINT; between 16 and  31 columns an INTEGER; between 32 and 63 columns a BIGINT; and finally a HUGEINT for 64 to 127 columns if available.

Developer and support

The grouping combinations extension was developed by Pedro Ferreira, a software developer at MonetDB Solutions. Feel free to sign up and send your questions to the MonetDB users-list.