Bug 6333 - Feature Request: Support aggregation functions FIRST() and LAST()
Summary: Feature Request: Support aggregation functions FIRST() and LAST()
Status: NEW
Alias: None
Product: SQL
Classification: Unclassified
Component: all (show other bugs)
Version: -- development
Hardware: All All
: Normal enhancement
Assignee: SQL devs
Depends on:
Reported: 2017-06-06 18:37 CEST by Lefteris
Modified: 2019-12-11 17:29 CET (History)
2 users (show)


Note You need to log in before you can comment on or make changes to this bug.
Description Lefteris 2017-06-06 18:37:05 CEST
User-Agent:       Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
Build Identifier: 

they are especially usefull if you want to select columns that are not part of a group by. For example if the table is


select table.a, first(table.b) from table group by table.a;

will return


Reproducible: Always
Comment 1 Martin van Dinther cwiconfidential 2017-06-08 13:35:22 CEST
SQL is a set-oriented language in which the (storage) order is not relevant, meaning it can change between queries (different data processing strategies) and during its life time (e.g. when an internal reorganisation or backup/restore is done).
So aggregates FIRST() and LAST() could return different values when running the same query multiple times.

Instead you could/should use MIN() and MAX() aggregates.

create table tabel (a int, b char(1));
insert into tabel values (1, 'x'), (1, 'y'), (2, 'z');
select a, min(b) as min_b, max(b) as max_b from tabel group by a order by a;
Comment 2 Lefteris 2017-06-08 13:42:55 CEST
MIN and MAX are extra work. Returning differernt values if the storage has been updated is ok. See oracle or postgress how they support first/last:

Comment 3 Stefan Manegold cwiconfidential 2017-07-12 14:56:56 CEST
As Martin vD suggested in his comment, FIRST() and LAST() have no clearly defined (intuitive) semantics in set-oriented SQL --- even LIMIT n without order by does not, other some returning "any" n tuples, possibly different ones with each query invocation (i.e., not even "guaranteed" to be deterministic).

Thus, maybe we could consider "ANY()" instead of "FIRST()" & "LAST()"?

Comment 4 Martin van Dinther cwiconfidential 2019-12-11 17:29:59 CET
As of Apr2019 release (11.33.3) we support SQL window functions:
 FIRST_VALUE '(' query_expression ')'
 LAST_VALUE '(' query_expression ')'
 NTH_VALUE '(' query_expression ',' query_expression ')'
See https://www.monetdb.org/blog/extended_sql_window_functions

With these new functions you could implement your desired query outputs by using for instance:
sql>select distinct a, first_value(b) over(partition by a) from tabel;
| a    | L2   |
|    1 | x    |
|    2 | z    |
2 tuples
sql>select distinct a, last_value(b) over(partition by a) from tabel;
| a    | L2   |
|    1 | y    |
|    2 | z    |
2 tuples

Alternatively you can create your own aggregate function using CREATE AGGREGATE command.
See https://www.monetdb.org/Documentation/Manuals/SQLreference/SQLSyntaxOverview#CREATE_AGGREGATE_LANGUAGE
for an example defined in language Python. Other supported languages are C, C++ and R if you prefer those.