The SQL implementation provides the well-known standard SQL aggregate functions COUNT(*|...), COUNT(DISTINCT ...), SUM(...), AVG(...), MIN(...) and MAX(...) over scalar types/expressions and groupings. In addition, a few important statistical aggregate functions: MEDIAN, QUANTILE, STDDEV, VAR and correlation CORR are available. From release Jun2020 (11.37.7) also COVAR_SAMP and COVAR_POP are available.
extra_statistical_aggregate_functions: sys.median '(' scalar_expr ')' | sys.median_avg '(' scalar_expr ')' | sys.quantile '(' scalar_expr ',' scalar_expr ')' | sys.quantile_avg '(' scalar_expr ',' scalar_expr ')' | sys.stddev_samp '(' scalar_expr ')' | sys.stddev_pop '(' scalar_expr ')' | sys.var_samp '(' scalar_expr ')' | sys.var_pop '(' scalar_expr ')' | sys.corr '(' scalar_expr ',' scalar_expr ')' | sys.covar_samp '(' scalar_expr ',' scalar_expr ')' | sys.covar_pop '(' scalar_expr ',' scalar_expr ')' |
For sys.quantile the percentile argument is a float value between 0.0 and 1.0. sys.median(<expr>) is equivalent to sys.quantile(<expr>, 0.5).
sys.stddev_samp computes the cumulative sample standard deviation and returns the square root of the sample variance. sys.stddev_pop computes the population standard deviation and returns the square root of the population variance. Both functions take as an argument any numeric datatype.
Likewise, sys.var_samp and sys.var_pop functions return the sample variance (/n-1) of a set of numbers and the biased variance (/n) of a set of numbers, respectively.
Note: The aggregate functions sys.median_avg and sys.quantile_avg are added in Nov2019 (11.35.3) release. They return the interpolated value if the median/quantile doesn't fall exactly on a particular row. These functions always return a value of type DOUBLE and only work for numeric types (various width integers, decimal and floating point).
Usage example:
create schema aggr_tst; set schema aggr_tst; create table tc (c real); insert into tc values (1), (2), (3), (4), (5), (9); select * from tc; select count(*) countstar, COUNT(c) count, COUNT(DISTINCT c) countdistinct , SUM(c) sum, AVG(c) average, PROD(c) product , MIN(c) minimum, MAX(c) maximum , sys.MEDIAN(c) median , sys.MEDIAN_AVG(c) median_avg , sys.QUANTILE(c, 0.5) quantile , sys.QUANTILE_AVG(c, 0.5) quantile_avg , sys.STDDEV_SAMP(c) stddev_samp , sys.STDDEV_POP(c) stddev_pop , sys.VAR_SAMP(c) var_samp , sys.VAR_POP(c) var_pop , sys.CORR(c, c+1) corr , sys.COVAR_SAMP(c, c*2) covar_samp , sys.COVAR_POP(c, c*2) covar_pop from tc; drop table tc; set schema sys; drop schema aggr_tst;
Tip: To view all the available aggregate functions in your MonetDB server use query:
SELECT * FROM sys.functions where type = 3;