GROUP BY clause

The GROUP BY clause specifies which grouping columns should be used to perform any aggregations in the SELECT clause. It returns one row for each group. For each group, you can apply an aggregate function such as count, min, max, sum, avg, median or quantile to compute group information.

When a GROUP BY clause is specified, all tuples that have matching data in the grouping columns (i.e., all tuples that belong to the same group) will be combined. NULLs are grouped to one group.

You may also use scalar expressions for the grouping columns.

The values of the grouping columns themselves are unchanged.

The syntax for GROUP BY clause is:

group_by_clause:
    GROUP BY { ALL | '*' | group_by_element [',' ...] }

group_by_element:
    scalar_exp
  | CUBE '(' ordinary_grouping_element [',' ...] ')'
  | ROLLUP '(' ordinary_grouping_element [',' ...] ')'
  | GROUPING SETS '(' grouping_set_element [',' ...] ')'
  | '(' ')'

ordinary_grouping_element:
    column_ref
  | '(' column_ref_commalist ')'

column_ref_commalist:
    column_ref
  | column_ref_commalist ',' column_ref

grouping_set_element:
    ordinary_grouping_element
  | CUBE '(' ordinary_grouping_element [',' ...] ')'
  | ROLLUP '(' ordinary_grouping_element [',' ...] ')'
  | GROUPING SETS '(' grouping_set_element [',' ...] ')'
  | '(' ')'

The GROUP BY ALL is equivalent to GROUP BY *. It will group by all non-aggregate columns of the SELECT list.

Examples:

-- Show number of tables per schema and table_type in descreasing count order
SELECT table_schema, table_type, count(*) AS table_count
  FROM information_schema.tables
 GROUP BY table_schema, table_type
 ORDER BY table_count DESC

-- Get the top-3 busiest train stations with minimal 5 services
SELECT station_name, count(*) AS num_services
  FROM train_services
 GROUP BY ALL
 HAVING count(*) >= 5
 ORDER BY num_services DESC
 LIMIT 3;

Scalar expressions are allowed in the GROUP BY and HAVING clauses. The same expressions can be used in the projection clause, if and only if they are literally equal e.g.:

SELECT count(*) * (type + language*10) as total
  FROM functions
 GROUP BY (type + language*10)
   HAVING (type + language*10) > 10
       OR count(*) > 99;

GROUP BY CUBE

The CUBE is an extension of the GROUP BY clause, which allows you to generate all possible combinations of groupings for specified columns, including subtotals and the grand total.

Here’s the syntax of the GROUP BY clause with CUBE:

SELECT column1, column2, aggregate_function (column3)
  FROM table_name
 GROUP BY CUBE (column1, column2);

In this syntax, we have two columns specified in the CUBE. The statement creates two subtotal combinations. Generally, if you have n number of columns listed in the CUBE, the statement will create 2n subtotal combinations.

The CUBE operator works as follows:

  • First, generate all possible combinations of grouped data for the specified columns.
  • Second, create an aggregate value for each combination of columns.
  • Third, include a grand total row that aggregates all the rows.

In practice, you’ll find the CUBE operator helpful for multidimensional analysis, like summarizing salary data by both cities and jobs.

Example:

-- create and populate a table first
CREATE TABLE salary_stats (
  city VARCHAR(40) NOT NULL,
  job  VARCHAR(40) NOT NULL,
  salary DECIMAL(9,2) NOT NULL);
INSERT INTO salary_stats VALUES
  ('Amsterdam','IT',80000.00),
  ('Amsterdam','HR',60000.00),
  ('Utrecht','IT',70000.00),
  ('Utrecht','HR',50000.00);

SELECT city, job, SUM(salary) AS total_salary
  FROM salary_stats
 GROUP BY CUBE (city, job);

+-----------+------+--------------+
| city      | job  | total_salary |
+===========+======+==============+
| Amsterdam | IT   |     80000.00 |
| Amsterdam | HR   |     60000.00 |
| Utrecht   | IT   |     70000.00 |
| Utrecht   | HR   |     50000.00 |
| null      | IT   |    150000.00 |    subtotal
| null      | HR   |    110000.00 |    subtotal
| Amsterdam | null |    140000.00 |    subtotal
| Utrecht   | null |    120000.00 |    subtotal
| null      | null |    260000.00 |    grand total
+-----------+------+--------------+

-- customizing the output, replace null with '', add labels, reorder
SELECT COALESCE(city, '') "City", COALESCE(job, '') "Job", SUM(salary) AS "Total Salary"
  FROM salary_stats
 GROUP BY CUBE (city, job)
 ORDER BY city ASC NULLS LAST, job DESC NULLS LAST;

+-----------+------+--------------+
| City      | Job  | Total Salary |
+===========+======+==============+
| Amsterdam | IT   |     80000.00 |
| Amsterdam | HR   |     60000.00 |
| Amsterdam |      |    140000.00 |    subtotal
| Utrecht   | IT   |     70000.00 |
| Utrecht   | HR   |     50000.00 |
| Utrecht   |      |    120000.00 |    subtotal
|           | IT   |    150000.00 |    subtotal
|           | HR   |    110000.00 |    subtotal
|           |      |    260000.00 |    grand total
+-----------+------+--------------+

GROUP BY ROLLUP

The GROUP BY clause within an aggregate function can only return an aggregate result at a single level of detail. To return aggregated results at multiple levels of details, you can use the ROLLUP with the GROUP BY clause:

SELECT column1, column2, aggregate_function (column_3)
  FROM table_name
 GROUP BY ROLLUP (column1, column2);

In this syntax, you place the columns you want to group within parentheses after the ROLLUP keyword.

The ROLLUP assumes that there is a hierarchy between column1 and column2: column1 > column2

Therefore, the ROLLUP helps generate reports that include subtotals and grand total.

How the ROLLUP works:

  • First, start with the most detailed grouping specified in the GROUP BY clause.
  • Second, move up the hierarchy by removing columns and calculating subtotals.
  • Third, end with a grand total row for all the rows.

Example:

SELECT city, job, SUM(salary) AS total_salary
  FROM salary_stats
 GROUP BY ROLLUP (city, job)
 ORDER BY city ASC NULLS LAST, job DESC NULLS FIRST;

+-----------+------+--------------+
| city      | job  | total_salary |
+===========+======+==============+
| Amsterdam | null |    140000.00 |    subtotal
| Amsterdam | IT   |     80000.00 |
| Amsterdam | HR   |     60000.00 |
| Utrecht   | null |    120000.00 |    subtotal
| Utrecht   | IT   |     70000.00 |
| Utrecht   | HR   |     50000.00 |
| null      | null |    260000.00 |    grand total
+-----------+------+--------------+

GROUP BY GROUPING SETS

If you want to create multiple groupings within the same query, you can use the GROUPING SETS option of the GROUP BY clause.

The GROUPING SETS clause is like having multiple GROUP BY clauses within the same query. In practice, you’ll find the GROUPING SETS helpful in calculating specific subtotals and grand totals.

Here’s the basic syntax of the GROUP BY clause with the GROUPING SETS:

SELECT column1, column2, aggregate_function (column_3)
  FROM table_name
 GROUP BY
  GROUPING SETS (
      (column1, column2),
      (column1),
      (column2),
      ()
  );

Example:

-- Use GROUPING SETS to create four groupings:
-- 1) grouping (city, job) groups salaries by city and job
-- 2) grouping (city) groups salaries by city
-- 3) grouping (job) groups salaries by job
-- 4) grouping () represents the grand total salary

SELECT city, job, SUM(salary) AS total_salary
  FROM salary_stats
 GROUP BY GROUPING SETS ((city, job),  (city),  (job),  ());

+-----------+------+--------------+
| city      | job  | total_salary |
+===========+======+==============+
| Amsterdam | IT   |     80000.00 |
| Amsterdam | HR   |     60000.00 |
| Utrecht   | IT   |     70000.00 |
| Utrecht   | HR   |     50000.00 |
| Amsterdam | null |    140000.00 |
| Utrecht   | null |    120000.00 |
| null      | IT   |    150000.00 |
| null      | HR   |    110000.00 |
| null      | null |    260000.00 |
+-----------+------+--------------+

SELECT city as "City", COALESCE(job, '') "Job", SUM(salary) AS "Total Salary"
  FROM salary_stats
 GROUP BY GROUPING SETS ((city), (city, job))
 ORDER BY city ASC, job DESC;

+-----------+------+--------------+
| City      | Job  | Total Salary |
+===========+======+==============+
| Amsterdam | IT   |     80000.00 |
| Amsterdam | HR   |     60000.00 |
| Amsterdam |      |    140000.00 |
| Utrecht   | IT   |     70000.00 |
| Utrecht   | HR   |     50000.00 |
| Utrecht   |      |    120000.00 |
+-----------+------+--------------+