Bug 6333

Summary: Feature Request: Support aggregation functions FIRST() and LAST()
Product: SQL Reporter: Lefteris <lsidir>
Component: allAssignee: SQL devs <bugs-sql>
Status: NEW ---    
Severity: enhancement CC: martin.van.dinther, Stefan.Manegold
Priority: Normal    
Version: -- development   
Hardware: All   
OS: All   

Description Lefteris 2017-06-06 18:37:05 CEST
User-Agent:       Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
Build Identifier: 

they are especially usefull if you want to select columns that are not part of a group by. For example if the table is


|a|b|
-----
|1|x|
+-+-+
|1|y|
+-+-+
|2|z|
+-+-+

select table.a, first(table.b) from table group by table.a;

will return

+-+-+
|1|x|
+-+-+
|2|z|
+-+-+


Reproducible: Always
Comment 1 Martin van Dinther cwiconfidential 2017-06-08 13:35:22 CEST
SQL is a set-oriented language in which the (storage) order is not relevant, meaning it can change between queries (different data processing strategies) and during its life time (e.g. when an internal reorganisation or backup/restore is done).
So aggregates FIRST() and LAST() could return different values when running the same query multiple times.

Instead you could/should use MIN() and MAX() aggregates.

create table tabel (a int, b char(1));
insert into tabel values (1, 'x'), (1, 'y'), (2, 'z');
select a, min(b) as min_b, max(b) as max_b from tabel group by a order by a;
Comment 2 Lefteris 2017-06-08 13:42:55 CEST
MIN and MAX are extra work. Returning differernt values if the storage has been updated is ok. See oracle or postgress how they support first/last:

https://wiki.postgresql.org/wiki/First/last_(aggregate)
Comment 3 Stefan Manegold cwiconfidential 2017-07-12 14:56:56 CEST
As Martin vD suggested in his comment, FIRST() and LAST() have no clearly defined (intuitive) semantics in set-oriented SQL --- even LIMIT n without order by does not, other some returning "any" n tuples, possibly different ones with each query invocation (i.e., not even "guaranteed" to be deterministic).

Thus, maybe we could consider "ANY()" instead of "FIRST()" & "LAST()"?

Stefan
Comment 4 Martin van Dinther cwiconfidential 2019-12-11 17:29:59 CET
As of Apr2019 release (11.33.3) we support SQL window functions:
 FIRST_VALUE '(' query_expression ')'
 LAST_VALUE '(' query_expression ')'
and
 NTH_VALUE '(' query_expression ',' query_expression ')'
See https://www.monetdb.org/blog/extended_sql_window_functions

With these new functions you could implement your desired query outputs by using for instance:
sql>select distinct a, first_value(b) over(partition by a) from tabel;
+------+------+
| a    | L2   |
+======+======+
|    1 | x    |
|    2 | z    |
+------+------+
2 tuples
sql>select distinct a, last_value(b) over(partition by a) from tabel;
+------+------+
| a    | L2   |
+======+======+
|    1 | y    |
|    2 | z    |
+------+------+
2 tuples

Alternatively you can create your own aggregate function using CREATE AGGREGATE command.
See https://www.monetdb.org/Documentation/Manuals/SQLreference/SQLSyntaxOverview#CREATE_AGGREGATE_LANGUAGE
for an example defined in language Python. Other supported languages are C, C++ and R if you prefer those.