As of MonetDB Jun2020 release, we have added support for ROLLUP, CUBE and GROUPING SETS features defined by the SQL:1999 standard in the GROUP BY clause. These keywords allow one to perform multiple grouping set combinations in a single statement, containing partial and total computed results. At the same time, the GROUPING aggregate, which identifies the grouping columns used in the output, has also been implemented.
The grammar is as follows:
group_by_clause:
GROUP BY group_by_element [',' group_by_element ... ]
group_by_element:
identifier
| ROLLUP '(' identifier [',' identifier ... ] ')'
| CUBE '(' identifier [',' identifier ... ] ')'
| GROUPING SETS '(' group_by_element [',' group_by_element ... ] ')'
| '(' ')'
The expression ‘(’ ‘)’ means a grouping without columns, i.e. a global aggregate.
In practice, whenever ROLLUP, CUBE and GROUPING SETS are used inside the GROUP BY clause, the incoming data is split for the generated grouping combinations, then each grouping is computed, then the results are returned. This can be seen as multiple GROUP BY clauses being generated, computed, then merged with UNION statements. Depending on the grouping keyword used, the generated grouping combination is different.
- ROLLUP represents a hierarchy of columns. For n grouping columns, ROLLUP generates n+1 groupings. As an example, ROLLUP (a,b,c) translates into four grouping combinations: (a,b,c), (a,b), (a) and ().
- CUBE represents all possible combinations of values in the supplemented columns. For n grouping columns, CUBE generates 2^n groupings. As an example, CUBE (a,b,c) translates into eight grouping combinations: (a,b,c), (a,b), (a,c), (b,c), (a), (b), (c) and ().
- GROUPING SETS computes the list of grouping combinations supplied as they are with no changes. For n grouping columns, GROUPING SETS generates n groupings. As an example, GROUPING SETS (a,b,c) translates into three grouping combinations: (a), (b) and (c).
Nested and mixed groupings:
- GROUPING SETS allows nesting of combinations, however, they are merged altogether. As an example, GROUPING SETS (a, GROUPING SETS (b), ROLLUP (c, d)), translates into (a), (b), (c, d), (c), ().
- The elements of ROLLUP and CUBE may contain lists of columns, in which they are treated as a single unit. As an example, ROLLUP (a,b,c,d), which translates into (a,b,c,d), (a,b,c), (a,b), (a) and (), is different from ROLLUP ((a,b),(c,d)), which translates into (a,b,c,d), (a,b) and ().
- If multiple grouping elements are given in the GROUP BY clause, the output combinations are the cross product of every generated sublist altogether. As an example, GROUP BY ROLLUP(a,b), c, CUBE(d,e) means ((a,b),(a),()) x (c) x ((d,e),(d),(e),()), which translates into (a,b,c,d,e), (a,b,c,d), (a,b,c,e), (a,b,c), (a,c,d,e), (a,c,d), (a,c,e), (c,d,e), (c,d), (c,e), (c).
The GROUPING aggregate has also been added to identify the grouping columns used for each row computed in the output. This aggregate takes as input identifiers for the group columns used in the GROUP BY clause. The output is bitmask indicating which of the arguments were not used in the output grouping set. The last grouping element corresponds to the least-significant bit.
Demonstration
Below, we list example queries with the added functionalities.
CREATE TABLE categories (id INTEGER, description TEXT);
CREATE TABLE sections (id INTEGER, description TEXT);
CREATE TABLE products (id INTEGER, categoryid int, sectionid INTEGER, description TEXT, price DECIMAL(6,2));
CREATE TABLE sales (productid INTEGER, sale_day DATE, units INTEGER);INSERT INTO categories VALUES (1, 'fresh food'), (2, 'dry food'), (3, 'drinks');
INSERT INTO sections VALUES (1, 'front'), (2, 'side'), (3, 'back');
INSERT INTO products VALUES (1, 1, 1, 'apples', 1.5), (2, 1, 2, 'melons', 4.0), (3, 2, 2, 'peanuts', 2.0), (4, 3, 1, 'water', 1.0), (5, 3, 3, 'wine', 5.0), (6, 2, 3, 'walnuts', 1.5);
INSERT INTO sales VALUES (1, date '2020-03-01', 10), (2, date '2020-03-01', 3), (4, date '2020-03-01', 4), (1, date '2020-03-02', 6), (4, date '2020-03-02', 5), (5, date '2020-03-02', 2), (1, date '2020-03-03', 7), (3, date '2020-03-03', 4), (2, date '2020-03-03', 3), (5, date '2020-03-03', 1), (6, date '2020-03-03', 1);-- In the first example we compute the total sales per product per day, then the total sales per product, and finally the overall total using the ROLLUP construct.
-- If a column was not used for the aggregate, it gets a NULL value by default. In our query, we use COALESCE to give them a meaningful name.
SELECT COALESCE(products.description, 'all_products') AS description,
COALESCE(sale_day, 'all_days') AS sale_day,
totals.total
FROM (
SELECT productid, sale_day, SUM(units * price) AS total
FROM products
LEFT JOIN sales ON sales.productid = products.id
GROUP BY ROLLUP(productid, sale_day)
) AS totals
LEFT JOIN products ON products.id = totals.productid
ORDER BY sale_day NULLS LAST, productid NULLS LAST;
description | sale_day | total |
---|---|---|
apples | 2020-03-01 | 15.0000 |
melons | 2020-03-01 | 12.0000 |
water | 2020-03-01 | 4.0000 |
apples | 2020-03-02 | 9.0000 |
water | 2020-03-02 | 5.0000 |
wine | 2020-03-02 | 10.0000 |
apples | 2020-03-03 | 10.5000 |
melons | 2020-03-03 | 12.0000 |
peanuts | 2020-03-03 | 8.0000 |
wine | 2020-03-03 | 5.0000 |
walnuts | 2020-03-03 | 1.0000 |
apples | all_days | 34.5000 |
melons | all_days | 24.0000 |
peanuts | all_days | 8.0000 |
water | all_days | 9.0000 |
wine | all_days | 15.0000 |
walnuts | all_days | 1.5000 |
all_products | all_days | 92.0000 |
-- In the following query, we compute the total sales by category, section and global. The left joins in the outer query are used to get the description.
SELECT COALESCE(sections.description, 'all_sections') AS section,
COALESCE(categories.description, 'all_categories') AS category,
totals.total
FROM (
SELECT categoryid, sectionid, SUM(units * price) AS total
FROM products
LEFT JOIN sales ON sales.productid = products.id
GROUP BY GROUPING SETS ((categoryid), (sectionid), ())
) AS totals
LEFT JOIN categories ON totals.categoryid = categories.id
LEFT JOIN sections ON totals.sectionid = sections.id;
description | sale_day | total |
---|---|---|
front | all_categories | 43.5000 |
side | all_categories | 32.0000 |
back | all_categories | 16.5000 |
all_sections | fresh food | 58.5000 |
all_sections | drinks | 24.0000 |
all_sections | dry food | 9.5000 |
all_sections | all_categories | 92.0000 |
--We can use the GROUPING aggregate to identify which of the columns were used in the grouping, then filter the result based on this information. In the next example we create a view with the computed results of the previous example and then we filter the category exclusive aggregates.
CREATE VIEW sales_totals AS
SELECT GROUPING(categoryid) AS category_aggregates,
GROUPING(sectionid) AS section_aggregates,
categoryid, sectionid, SUM(units * price) AS total
FROM products
LEFT JOIN sales ON sales.productid = products.id
GROUP BY GROUPING SETS ((categoryid), (sectionid), ());SELECT description, total
FROM sales_totals
LEFT JOIN categories ON sales_totals.categoryid = categories.id
WHERE category_aggregates = 0;
description | total |
---|---|
fresh food | 58.5000 |
drinks | 24.0000 |
dry food | 9.5000 |
Implementation notes
Some implementation notes are left here:
- Only identifiers are allowed inside ROLLUP, CUBE and GROUPING SETS clauses. Some other databases allow expressions as a non SQL standard extension.
- The SQL type of the result of the GROUPING aggregate varies with the number of grouping columns. Up to 7 columns, it is a TINYINT; between 8 and 15 columns a SMALLINT; between 16 and 31 columns an INTEGER; between 32 and 63 columns a BIGINT; and finally a HUGEINT for 64 to 127 columns if available.
Developer and support
The grouping combinations extension was developed by Pedro Ferreira, a software developer at MonetDB Solutions. Feel free to sign up and send your questions to the MonetDB users-list.