Hi all,

 

We are using the June 2020 release, trying to query a table with 24 mio rows and join each row from this ‘left’ table with one row from a right table according to a specific join clause. This clause does not result in a carthesian product.

 

The query structure is as follows:

 

select left.*

from myschema.mytable left

inner join otherschema.yourtable right

on left.somedatefield     >= right.fromdate

and left.somedatefield   <= right.untildate

where left.otherdatefield is null ;

 

This query results in following error in the merovingian.log:

2020-07-17 11:58:00 MSG merovingian[401281]: database 'datalake' (-1) has crashed with signal SIGSEGV (dumped core)

 

However, if I change the query like below, making the left table a derived table and sorting the date field used in the join, the error doesn’t occur:

 

select left.*

from (select * from myschema.mytable order by somedatefield) left

inner join otherschema.yourtable right

on left.somedatefield     >= right.fromdate

and left.somedatefield   <= right.untildate

where left.otherdatefield is null ;

 

When I tested the original query with a subset from the left table there was a threshold of a certain amount of rows (in this case 26 rows) below which the query ran fine.

 

Also, repeating the test on a different ‘left’ table with ~200k rows didn’t result in a crash.

 

Could it be that there’s been a change in the query pipeline or join handling?

 

Kind regards,

 

Frank