Aggregate functions operate on a column or column expression of a table or a group (as created via GROUP BY clause).
We support all the well-known standard SQL aggregate functions:
COUNT(*|...)
, COUNT(DISTINCT ...)
, SUM(...)
, AVG(...)
, MIN(...)
, MAX(...)
and even PROD(...)
over scalar types/expressions and groupings.
In addition, many important statistical aggregate functions are supported.
Also character string aggregate functions
LISTAGG
and GROUP_CONCAT
and json.tojsonarray()
are available.
For the examples below to create the shown results, we use a simple table t with data:
CREATE TABLE t(i int, c varchar(8));
INSERT INTO t(i,c) VALUES (1,'one'), (2,'two'), (3,'tree'), (4,'four'), (NULL,NULL);
Function | Return type | Description | Example | Result |
---|---|---|---|---|
avg(numeric_expr) | double | compute the average of the non-NULL integer/float/decimal/month_interval values in the column or group | avg(i) | 2.5 |
count(*) | bigint | count the number of rows in the column or group including those who may have NULL values | count(*) | 5 |
count(col_expr) | bigint | count the number of non-NULL values in the column or group | count(i) | 4 |
count(distinct col_expr) | bigint | count the number of distinct non-NULL values in the column or group | count(distinct i%2) | 2 |
max(col_expr) | type of col_expr | return the maximum non-NULL value in the column or group | max(c) | two |
min(col_expr) | type of col_expr | return the minimum non-NULL value in the column or group | min(c) | four |
prod(numeric_expr) | bigint or hugeint or double | compute the product of the non-NULL numeric values in the column or group | prod(i) | 24 |
prod(distinct numeric_expr) | bigint or hugeint or double | compute the product of the distinct non-NULL numeric values in the column or group | prod(distinct 1 + i%2) | 2 |
sum(integer_expr) | bigint or hugeint | compute the sum of all non-NULL integer values in the column or group | sum(i) | 10 |
sum(distinct integer_expr) | bigint or hugeint | compute the sum of all distinct non-NULL integer values in the column or group | sum(distinct i/2) | 3 |
sum(decimal_expr) | decimal | compute the sum of all non-NULL decimal values in the column or group | sum(cast(i as decimal)) | 10.000 |
sum(float_expr) | real or double | compute the sum of all non-NULL floating point values in the column or group | sum(cast(i as real)) | 10.0 |
sum(month_interval_expr) | month_interval | compute the sum of all non-NULL interval month values in the column or group | sum(cast(i as interval month)) | 10 |
sum(sec_interval_expr) | sec_interval | compute the sum of all non-NULL interval second values in the column or group | sum(cast(i as interval second)) | 10.000 |
The SQL:2016 standard LISTAGG
is supported. It is equivalent to sys.group_concat
aggregate function.
In the below examples you can simply replace sys.group_concat
with LISTAGG
and get the same results.
LISTAGG
does not require the sys. prefix.
For the examples below we add some data to the example table t:
INSERT INTO t(i,c) VALUES
(1, ''), (1, 'kwik'),
(2, NULL), (2, 'kwek'),
(3, ' '), (3, 'kwak'),
(4, 'kwak'), (4, ' ');
to create the shown Results.
Function | Return type | Description | Example | Result |
---|---|---|---|---|
sys.group_concat(col_expr) | varchar | concatenate all non-NULL values in a column or group into one string with values separated by a comma (the default separator) | sys.group_concat(c) | kwek, ,kwak,kwak, ,one,two,tree,four,,kwik |
sys.group_concat(ALL col_expr) | varchar | same as sys.group_concat(col_expr), i.e. ALL is the default behaviour | sys.group_concat(ALL c) | one,two,tree,four,,kwik,kwek, ,kwak,kwak, |
sys.group_concat(DISTINCT col_expr) | varchar | concatenate all non-NULL and unique values in a column or group into one string with values separated by a comma (the default separator) | sys.group_concat(DISTINCT c) | one,two,tree,four,,kwik,kwek, ,kwak |
sys.group_concat(DISTINCT (col_expr)) | varchar | same as sys.group_concat(DISTINCT col_expr) | sys.group_concat(DISTINCT (c)) | one,two,tree,four,,kwik,kwek, ,kwak |
sys.group_concat(col_expr, separator_str) | varchar | concatenate non-NULL values in a column or group into one string with values separated by separator_str | sys.group_concat(i, '|') | 1|2|3|4|1|1|2|2|3|3|4|4 |
sys.group_concat(col_expr, separator_col_expr) | varchar | concatenate non-NULL values in a column or group into one string with values separated by separator_col_expr | sys.group_concat(i, c) | 1two2tree3four41kwik12kwek2 3kwak3kwak4 4 |
sys.group_concat(DISTINCT col_expr, separator_str) | varchar | concatenate non-NULL and unique values in a column or group into one string with values separated by separator_str | sys.group_concat(DISTINCT i, '|') | 1|2|3|4 |
sys.group_concat(DISTINCT col_expr, separator_col_expr) | varchar | concatenate non-NULL values in a column or group into one string with values separated by separator_col_expr. DISTINCT is computed over (col_expr, separator_col_expr). | sys.group_concat(DISTINCT i, c) | 1two2tree3four41kwik12kwek2 3kwak3kwak4 4 |
sys.group_concat(col_expr1 ORDER BY col_expr2) | varchar | concatenate all non-NULL values in a column or group into one string with values separated by a comma (the default separator). Order the values of col_expr1 according to the ascending order (default) of the values of col_expr2 | sys.group_concat(c ORDER BY c) | , , ,four,kwak,kwak,kwek,kwik,one,tree,two |
sys.group_concat(col_expr1 ORDER BY col_expr2 DESC) | varchar | concatenate all non-NULL values in a column or group into one string with values separated by a comma (the default separator). Order the values of col_expr1 according to the descending order of the values of col_expr2 | sys.group_concat(c ORDER BY c DESC) | two,tree,one,kwik,kwek,kwak,kwak,four, , , |
sys.group_concat(col_expr1) WITHIN GROUP (ORDER BY col_expr2) | varchar | same as sys.group_concat(col_expr1 ORDER BY col_expr2) | sys.group_concat(c) WITHIN GROUP (ORDER BY c) | , , ,four,kwak,kwak,kwek,kwik,one,tree,two |
sys.group_concat(col_expr1, separator_str ORDER BY col_expr2) | varchar | concatenate all non-NULL values in a column or group into one string with values separated by separator_str. Order the values of col_expr1 according to the ascending order of the values of col_expr2 | sys.group_concat(c, '|' ORDER BY c) | | | |four|kwak|kwak|kwek|kwik|one|tree|two |
sys.group_concat(DISTINCT col_expr1, separator_str ORDER BY col_expr2) | varchar | concatenate all non-NULL and unique values in a column or group into one string with values separated by separator_str. Order the values of col_expr1 according to the ascending order of the values of col_expr2 | sys.group_concat(DISTINCT c, '|' ORDER BY c) | | |four|kwak|kwek|kwik|one|tree|two |
Note: for the GROUP_CONCAT
examples above without an ORDER BY
condition, we have shown only one of their correct result.
Without an ordering, the input values can be concatenated in any order.
In addition, depending on the implementation, every reexecution of the same function can produce a different but correct result.
To demonstrate this, we show different results for sys.group_concat(c)
and sys.group_concat(ALL c)
, even though they are essentially the same function.
Function | Return type | Description | Example | Result |
---|---|---|---|---|
json.tojsonarray(clob) | varchar | convert string values in the column or group into a json array string | json.tojsonarray(c) | [ "one", "two", "tree", "four" ] |
json.tojsonarray(double) | varchar | convert numeric values in the column or group into a json array string | json.tojsonarray(i) | [ 1.000000, 2.000000, 3.000000, 4.000000 ] |
Note: You must include the json. prefix to these aggregate functions for them to work properly.
Note: You must include the sys. prefix for below aggregate functions (see examples) in order to work properly.
Statistic Function | Return type | Description | Example | Result |
---|---|---|---|---|
sys.corr(numeric_a, numeric_b) | double | return the correlation coefficient of the non-NULL values in the numeric (integer/decimal/real/double) column or group numeric_a versus numeric_b | sys.corr(i, i+2) | 0.7500000000000001 |
sys.covar_pop(numeric_a, numeric_b) | double | return the population covariance of numeric_expr of the non-NULL values in the numeric (integer/decimal/real/double/interval) column or group numeric_a versus numeric_b. Supported from release Jun2020 (11.37.7) | sys.covar_pop(i, i*2) | 2.5 |
sys.covar_samp(numeric_a, numeric_b) | double | return the sample covariance of numeric_expr of the non-NULL values in the numeric (integer/decimal/real/double/interval) column or group numeric_a versus numeric_b. Supported from release Jun2020 (11.37.7) | sys.covar_samp(i, i*2) | 3.3333333333333335 |
sys.median(col_expr) | type of col_expr | return the statistic median of the non-NULL values in the numeric (integer/decimal/real/double) or temporal (date/time/timestamp/interval) column or group | sys.median(i) | 2 |
sys.median_avg(numeric_expr) | double | return the statistic median average of the non-NULL values in the numeric (integer/decimal/real/double) column or group | sys.median_avg(i) | 2.5 |
sys.quantile(col_expr, percentile) | type of col_expr | return the statistic quantile of the non-NULL values in the numeric (integer/decimal/real/double) or temporal (date/time/timestamp/interval) column or group. The percentile argument must be between 0.0 and 1.0 | sys.quantile(i, 0.7) | 3 |
sys.quantile_avg(numeric_expr, percentile) | double | return the statistic quantile average of the non-NULL values in the numeric (integer/decimal/real/double) column or group. The percentile argument must be between 0.0 and 1.0 | sys.quantile_avg(i, 0.6) | 2.8 |
sys.stddev_pop(numeric_expr) | double | return the population standard deviation of numeric_expr of the non-NULL values in the numeric (integer/decimal/real/double/interval) column or group. Same as the square root of var_pop(). | sys.stddev_pop(i) | 1.118033988749895 |
sys.stddev_samp(numeric_expr) | double | return the sample standard deviation of numeric_expr of the non-NULL values in the numeric (integer/decimal/real/double/interval) column or group. Same as the square root of var_samp(). | sys.stddev_samp(i) | 1.2909944487358056 |
sys.var_pop(numeric_expr) | double | return the population standard variance of numeric_expr of the non-NULL values in the numeric (integer/decimal/real/double/interval) column or group. Same as the square of stddev_pop(). | sys.var_pop(i) | 1.25 |
sys.var_samp(numeric_expr) | double | return the sample standard variance of numeric_expr of the non-NULL values in the numeric (integer/decimal/real/double/interval) column or group. Same as the square of stddev_samp(). | sys.var_samp(i) | 1.6666666666666667 |
For sys.quantile
the percentile argument is a float value between 0.0 and 1.0.sys.median(expr)
is equivalent to sys.quantile(expr, 0.5)
.
sys.stddev_samp
computes the cumulative sample standard deviation and returns the square root of the sample variance.
sys.stddev_pop
computes the population standard deviation and returns the square root of the population variance.
Both functions take as an argument any numeric datatype.
Likewise, sys.var_samp
and sys.var_pop
functions return the sample variance (/n-1)
of a set of numbers and the biased variance (/n) of a set of numbers, respectively.
The aggregate functions sys.median_avg
and sys.quantile_avg
return the
interpolated value if the median/quantile doesn't fall exactly on a particular row.
These functions always return a value of type DOUBLE and only work for
numeric types (various width integers, decimal and floating point).
create schema aggr_tst;
set schema aggr_tst;
create table aggr_tst.tc (c real);
insert into aggr_tst.tc values (1), (2), (3), (4), (5), (9);
select * from aggr_tst.tc;
select
count(*) as countstar
, count(c) as count
, count(distinct c) as countdistinct
, sum(c) as sum
, avg(c) as average
, prod(c) as product
, min(c) as minimum
, max(c) as maximum
, group_concat(c,',') as group_concat
, listagg(c,',' order by c desc) as listagg
, median(c) as median
, median_avg(c) as median_avg
, quantile(c, 0.5) as quantile
, quantile_avg(c, 0.5) as quantile_avg
, stddev_samp(c) as stddev_samp
, stddev_pop(c) as stddev_pop
, var_samp(c) as var_samp
, var_pop(c) as var_pop
, corr(c, c+1) as corr
, covar_samp(c, c*2) as covar_samp
, covar_pop(c, c*2) as covar_pop
from aggr_tst.tc;
drop table aggr_tst.tc;
set schema sys;
drop schema aggr_tst;
Tip: To list all the available aggregate functions in your MonetDB server run query:
SELECT DISTINCT schema_id, name, func, mod, language, system FROM sys.functions WHERE type = 3 ORDER BY name;
You can also create your own aggregate functions, see Create Aggregate function command.