Bug 6968 - Support SET ANSI_NULLS ON | OFF
Summary: Support SET ANSI_NULLS ON | OFF
Status: NEW
Alias: None
Product: SQL
Classification: Unclassified
Component: all (show other bugs)
Version: 11.37.11 (Jun2020-SP1)
Hardware: Other Mac OS X
: Normal enhancement
Assignee: SQL devs
URL:
Keywords:
: 6969 (view as bug list)
Depends on:
Blocks:
 
Reported: 2020-09-15 06:21 CEST by justin.jin
Modified: 2020-10-29 18:28 CET (History)
4 users (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description justin.jin 2020-09-15 06:21:36 CEST
User-Agent:       Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36
Build Identifier: 

I want to make a null equals to another null.

Currently, the behavior is same to SET ANSI_NULLS ON.

There exists a workaround that using "a.xxx = b.xxx OR (a.xxx IS NULL and b.xxx IS NULL)"

The only problem is the bad performance.

I tried to search some solutions in monetdb documents, but no results.

Does anyone can give me some advices?

Reproducible: Always
Comment 1 Pedro Ferreira 2020-09-15 09:08:11 CEST
Recently we added the 'semantics' flag on the SQL layer, for NULL = NULL comparisons at some places (e.g. a.xxx is NULL). We can add an optimizer where it takes an expression "a.xxx = b.xxx OR (a.xxx IS NULL and b.xxx IS NULL)" and re-write it to "a.xxx = b.xxx" with 'semantics' flag on. However currently I have been very busy with lots of fixes regarding performance, so expect to have low priority. 

Meanwhile some of these queries can be re-written with outer joins. Are you sure this optimization is needed?
Comment 2 justin.jin 2020-09-15 09:32:10 CEST
We want to treat NULL as a normal value, so the expected results contains full 
field from two joined tables.

Using outer join only adding not matched rows from one side (includes NULL), it doesn't meet our expectation.

Replacing "OR" with "UNION ALL" can improving some performance.

But the query time is still too long when 4 or more dimensions and 10+ measures involved(and every one maybe joined several times).

I'm glad to hear the fixing is on the way.
Can we increase the priority since it has impacted the performance?

(In reply to Pedro Ferreira from comment #1)
> Recently we added the 'semantics' flag on the SQL layer, for NULL = NULL
> comparisons at some places (e.g. a.xxx is NULL). We can add an optimizer
> where it takes an expression "a.xxx = b.xxx OR (a.xxx IS NULL and b.xxx IS
> NULL)" and re-write it to "a.xxx = b.xxx" with 'semantics' flag on. However
> currently I have been very busy with lots of fixes regarding performance, so
> expect to have low priority. 
> 
> Meanwhile some of these queries can be re-written with outer joins. Are you
> sure this optimization is needed?

We want to treat NULL as a normal value, so the expected results contains full 
field from two joined tables.

Using outer join only adding not matched rows from one side (includes NULL), it doesn't meet our expectation.

Replacing "OR" with "UNION ALL" can improving some performance.

But the query time is still too long when 4 or more dimensions and 10+ measures involved(and every one maybe joined several times).

I'm glad to hear the fixing is on the way.
Can we increase the priority since it has impacted the performance?
Comment 5 Pedro Ferreira 2020-09-16 12:43:57 CEST
*** Bug 6969 has been marked as a duplicate of this bug. ***
Comment 6 Martin van Dinther cwiconfidential 2020-10-29 18:28:49 CET
In SQL NULL is never equal to another NULL by definition.

What you need to do is replace the NULLs in your data column(s) with a non-used domain value, for instance 0 for ints. Simply do:
 UPDATE a SET xxx = 0 WHERE xxx IS NULL;
 UPDATE b SET xxx = 0 WHERE xxx IS NULL;
Now you can leave out the
  OR (a.xxx IS NULL and b.xxx IS NULL)
part from you join/where condition.

If for some reason your are not allowed to update those a.xxx and b.xxx columns are need to retain the NULL info, extend both tables with an copy of those xxx columns which you can update with the values of xxx without nulls.
 ALTER TABLE a ADD COLUMN xxx2 int;
 UPDATE a SET xxx2 = coalesce(xxx, 0);
 ALTER TABLE b ADD COLUMN xxx2 int;
 UPDATE b SET xxx2 = coalesce(xxx, 0);
Use those copies to do your join condition on: a.xxx2 = b.xxx2

This will run fast and do what you need in the proper SQL way.

Let me know if we can close this non standard SQL request.