Bug 6122

Summary: feature request: allow some SELECT a.col IN ( b.col FROM b ) FROM a behavior?
Product: SQL Reporter: Anthony Damico <ajdamico>
Component: allAssignee: SQL devs <bugs-sql>
Status: NEW ---    
Severity: enhancement CC: ajdamico, arkr17997, martin.van.dinther
Priority: Normal    
Version: 11.23.13 (Jun2016-SP2)   
Hardware: All   
OS: Windows   

Description Anthony Damico 2016-11-16 15:58:00 CET
User-Agent:       Mozilla/5.0 (Windows NT 10.0; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0
Build Identifier: 

this feature request uses the same example queries as https://www.monetdb.org/bugzilla/show_bug.cgi?id=6121


i am not sure if these three commands should be allowed according to the SQL standard:


SELECT ( table_one.this_column IN ( SELECT this_column FROM table_two ) ) AS new_column FROM table_one ;
SELECT ( CASE WHEN ( table_one.this_column IN ( SELECT this_column FROM table_two ) ) THEN 1 ELSE 0 END ) AS new_column FROM table_one ;
SELECT COUNT(*) , ( CASE WHEN ( table_one.this_column IN ( SELECT this_column FROM table_two ) ) THEN 1 ELSE 0 END ) AS new_column FROM table_one GROUP BY new_column ;



Reproducible: Always

Steps to Reproduce:
CREATE TABLE table_one (this_column INTEGER) ;
INSERT INTO table_one VALUES (1) ;
INSERT INTO table_one VALUES (2) ;
INSERT INTO table_one VALUES (3) ;
INSERT INTO table_one VALUES (4) ;
INSERT INTO table_one VALUES (5) ;

CREATE TABLE table_two (this_column INTEGER) ;
INSERT INTO table_two VALUES (1) ;
INSERT INTO table_two VALUES (2) ;


# # # # related quaries that work without issue

SELECT SUM( ( table_one.this_column IN ( SELECT this_column FROM table_two ) ) ) FROM table_one ;
SELECT MAX( ( table_one.this_column IN ( SELECT this_column FROM table_two ) ) ) FROM table_one ;
SELECT AVG( CASE WHEN table_one.this_column IN ( SELECT this_column FROM table_two ) THEN 1 ELSE 0 END ) AS new_column FROM table_one ;


# # # # three examples with no proper error and no proper result

SELECT ( table_one.this_column IN ( SELECT this_column FROM table_two ) ) AS new_column FROM table_one ;
SELECT ( CASE WHEN ( table_one.this_column IN ( SELECT this_column FROM table_two ) ) THEN 1 ELSE 0 END ) AS new_column FROM table_one ;
SELECT COUNT(*) , ( CASE WHEN ( table_one.this_column IN ( SELECT this_column FROM table_two ) ) THEN 1 ELSE 0 END ) AS new_column FROM table_one GROUP BY new_column ;




if these are not supported by the SQL standard, then please close this feature request.
Comment 1 Martin van Dinther cwiconfidential 2016-11-24 19:17:10 CET
You can use sub-queries in the select-list, but they need to return a scalar value (and preferably be correlated) for each row of table_one. So use for instance:
SELECT this_column, (SELECT COUNT(*) FROM table_two t2 WHERE t2.this_column = table_one.this_column) AS new_column FROM table_one;

For the derivation of whether a match of this_column value exists in t2 or not, use:
SELECT this_column, (CASE WHEN (SELECT COUNT(*) FROM table_two t2 WHERE t2.this_column = table_one.this_column) = 0 THEN 0 ELSE 1 END) AS new_column FROM table_one;
+-------------+------------+
| this_column | new_column |
+=============+============+
|           1 |          1 |
|           2 |          1 |
|           3 |          0 |
|           4 |          0 |
|           5 |          0 |
+-------------+------------+
5 tuples (37.294ms)

You can use groupings on this query result, such as:
SELECT COUNT(*), (CASE WHEN (SELECT COUNT(*) FROM table_two t2 WHERE t2.this_column = table_one.this_column) = 0 THEN 0 ELSE 1 END) AS new_column  FROM table_one GROUP BY new_column;

This query unfortunately does not return a result in the Jun2016-SP2 release, which is a defect. See Bug 6121.

To (temporarily) workaround this defect problem, I found you can use a view. Create it via:
CREATE VIEW table_one_vw AS
SELECT this_column, (CASE WHEN (SELECT COUNT(*) FROM table_two t2 WHERE t2.this_column = table_one.this_column) = 0 THEN 0 ELSE 1 END) AS new_column FROM table_one;

and next do the grouping on the view:
SELECT COUNT(*), new_column FROM table_one_vw GROUP BY new_column;

or with multiple aggregations per group:
SELECT COUNT(*) AS count, MIN(this_column) AS min_this_column, MAX(this_column) AS max_this_column, AVG(this_column) AS avg_this_column, SUM(this_column) AS sum_this_column, new_column FROM table_one_vw GROUP BY new_column;
+-------+----------+----------+--------------------------+----------+----------+
| count | min_this | max_this | avg_this_column          | sum_this | new_colu |
:       : _column  : _column  :                          : _column  : mn       :
+=======+==========+==========+==========================+==========+==========+
|     2 |        1 |        2 |                      1.5 |        3 |        1 |
|     3 |        3 |        5 |                        4 |       12 |        0 |
+-------+----------+----------+--------------------------+----------+----------+
2 tuples (50.325ms)

These solutions give correct results, even if the values in this_column contain duplicates in table_one or in table_two or in both. Try it.

If you do not want to create a view (once), you can also use a WITH common table expression construct:
WITH table_one_cte AS (SELECT this_column, (CASE (SELECT COUNT(*) FROM table_two t2 WHERE t2.this_column = table_one.this_column) WHEN 0 THEN 0 ELSE 1 END) AS new_column FROM table_one)
SELECT COUNT(*) AS count, MIN(this_column) AS min_this_column, MAX(this_column) AS max_this_column, AVG(this_column) AS avg_this_column, SUM(this_column) AS sum_this_column, new_column FROM table_one_cte GROUP BY new_column;