Window functions

Window functions mk Sun, 10/13/2013 - 13:52

SQL provides a method to aggregate over a series of related tuples. They are called window functions and always come with an OVER() clause which determines how tuples are split up over the window functions.The PARTITION BY clause within OVER divides the rows into groups that share the same values of the PARTITION BY expression(s). For each row, the window function is computed over all rows participating in the group. The order within a partition can be used as well.

window_function_spec:
    window_function OVER   { window_name | '(' window_specification ')' }

window_function:
      RANK '(' ')'
    | DENSE_RANK '(' ')'
    | PERCENT_RANK '(' ')'
    | ROW_NUMBER '(' ')'
    | CUME_DIST '(' ')'
    | FIRST_VALUE '(' query_expression ')'
    | LAST_VALUE '(' query_expression ')'
    | NTH_VALUE '(' query_expression ',' query_expression ')'
    | NTILE '(' query_expression ')'
    | LEAD '(' query_expression [ ',' query_expression [ ',' query_expression ] ] ')'
    | LAG '(' query_expression [ ',' query_expression [ ',' query_expression ] ] ')'
    | aggregate_function

aggregate_function:
      COUNT '(' '*' ')'
    | COUNT '(' query_expression ')'
    | MAX '(' query_expression ')'
    | MIN '(' query_expression ')'
    | SUM '(' query_expression ')'
    | PROD '(' query_expression ')'
    | AVG '(' query_expression ')'

window_specification:
    [ window_name ]   [ PARTITION BY column_ref   [ ',' ... ] ]   [ ORDER BY sort_spec ]   [ window_frame ]

window_frame:
    { ROWS | RANGE | GROUPS }   { window_frame_start | BETWEEN window_bound AND window_bound }
        [ EXCLUDING { CURRENT ROW | GROUP | TIES | NO OTHERS }   ]

window_frame_start:
     UNBOUNDED PRECEDING
   | value PRECEDING
   | CURRENT ROW

window_bound:
    window_frame_start
   | UNBOUNDED FOLLOWING
   | value FOLLOWING

Supported Window Functions:
    RANK() : BIGINT - Returns the rank number within a partition, starting at 1.
    DENSE_RANK() : BIGINT - Returns the rank of the current row without gaps, it counts peer groups.
    PERCENT_RANK() : DOUBLE - Calculates the relative rank of the current row: (rank() - 1) / (rows in partition - 1).
    ROW_NUMBER() : BIGINT - Returns the position of the tuple currently in the result set, starting at 1.
    CUME_DIST() : DOUBLE - Calculates the cumulative distribution: number of rows preceding or peer with current row / rows in partition.
    FIRST_VALUE(input A) : A - Returns input value at first row of the window frame.
    LAST_VALUE(input A) : A - Returns input value at last row of the window frame.
    NTH_VALUE(input A, nth BIGINT) : A - Returns input value at “nth” row of the window frame. If there is no “nth” row in the window frame, then NULL is returned.
    NTILE(nbuckets BIGINT) : BIGINT - Enumerates rows from 1 in each partition, dividing it in the most equal way possible.
    LAG(input A [, offset BIGINT [, default_value A ] ]) : A - Returns input value at row “offset” before the current row in the partition. If the offset row does not exist, then the “default_value” is output. If omitted, “offset” defaults to 1 and “default_value” to NULL.
    LEAD(input A [, offset BIGINT [, default_value A ] ]) : A - Returns input value at row “offset” after the current row in the partition. If the offset row does not exist, then the “default_value” is output. If omitted, “offset” defaults to 1 and “default_value” to NULL.
    MIN(input A) : A
    MAX(input A) : A
    COUNT(*) : BIGINT
    COUNT(input A) : BIGINT
    SUM(input A) : A
    PROD(input A) : A
    AVG(input A) : DOUBLE.

The supported frames are:
    ROWS - Frames are calculated on physical offsets of input rows.
    RANGE - Result frames are calculated on value differences from input rows (used with a custom PRECEDING or FOLLOWING bound requires an ORDER BY clause).
    GROUPS - Groups of equal row values are used to calculate result frames (requires an ORDER BY clause).

See also en.wikibooks.org/wiki/Structured_Query_Language/Window_functions

 

Examples:

create table ranktest (id int, k varchar(3));
insert into ranktest values (1, 'a'),(2, 'b'),(3, 'c'),(4, 'd');
insert into ranktest values (1, 'a'),(2, 'b'),(3, 'c'),(4, 'd');

select ROW_NUMBER() over () as foo from ranktest;
select ROW_NUMBER() over (PARTITION BY id) as foo, id from ranktest;
select ROW_NUMBER() over (PARTITION BY id ORDER BY id) as foo, id from ranktest;
select ROW_NUMBER() over (ORDER BY id) as foo, id from ranktest;

select RANK() over () as foo from ranktest;
select RANK() over (PARTITION BY id) as foo, id from ranktest;
select RANK() over (PARTITION BY id ORDER BY id) as foo, id from ranktest;
select RANK() over (ORDER BY id) as foo, id from ranktest;

select RANK() over () as foo, id, k from ranktest;
select RANK() over (PARTITION BY id) as foo, id, k from ranktest;
select RANK() over (PARTITION BY id ORDER BY id, k) as foo, id, k from ranktest;
select RANK() over (ORDER BY id, k) as foo, id, k from ranktest;

select DENSE_RANK() over () as foo, id, k from ranktest order by k;
select DENSE_RANK() over (PARTITION BY id) as foo, id, k from ranktest order by k;
select DENSE_RANK() over (PARTITION BY id ORDER BY id, k) as foo, id, k from ranktest order by k;
select DENSE_RANK() over (ORDER BY id, k) as foo, id, k from ranktest order by k;
drop table ranktest;

For more examples see extended_sql_window_functions