Hi all,
We are using the June 2020 release, trying to query a table with 24 mio rows and join each row from this 'left' table with one row from a right table according to a specific join clause. This clause does not result in a carthesian product.
The query structure is as follows:
select left.* from myschema.mytable left inner join otherschema.yourtable right on left.somedatefield >= right.fromdate and left.somedatefield <= right.untildate where left.otherdatefield is null ;
This query results in following error in the merovingian.log: 2020-07-17 11:58:00 MSG merovingian[401281]: database 'datalake' (-1) has crashed with signal SIGSEGV (dumped core)
However, if I change the query like below, making the left table a derived table and sorting the date field used in the join, the error doesn't occur:
select left.* from (select * from myschema.mytable order by somedatefield) left inner join otherschema.yourtable right on left.somedatefield >= right.fromdate and left.somedatefield <= right.untildate where left.otherdatefield is null ;
When I tested the original query with a subset from the left table there was a threshold of a certain amount of rows (in this case 26 rows) below which the query ran fine.
Also, repeating the test on a different 'left' table with ~200k rows didn't result in a crash.
Could it be that there's been a change in the query pipeline or join handling?
Kind regards,
Frank