Bug 6677

Summary: MonetDB should optimize queries of the form select * from foo where FALSE, select * from foo where 0 = 1, etc.
Product: SQL Reporter: aris <aris.koning>
Component: allAssignee: SQL devs <bugs-sql>
Status: ASSIGNED ---    
Severity: normal CC: mk, niels
Priority: Normal    
Version: 11.31.13 (Aug2018-SP2)   
Hardware: x86_64 (amd64/em64t)   
OS: Linux   

Description aris 2019-01-15 13:48:51 CET
User-Agent:       Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:64.0) Gecko/20100101 Firefox/64.0
Build Identifier: 

Many (dynamically typed) language specific ORM frameworks when connecting to a database table, have to perform an initial query to create (something like) a runtime class representing the table's schema.

For instance in R when using DBI, MonetDB.R, dbplyr libraries to connect to table foo in MonetDB, the following code

    conn <- dbConnect(MonetDB.R(), host="localhost", dbname="testdb", user="monetdb", password="monetdb")
    foo <- tbl(conn, "foo")
    foo %>% filter(bar %like% '%baz%') %>% collect()

will first perform a dummy query whose result set will be empty in order to get the necessary meta-data:
    select * from foo where 0 = 1;
Afterwards it constructs and submits the actual query:
    select * from foo where bar like '%baz%';

The problem is that MonetDB produces plans that are oblivious to the fact that the predicate is an anti-tautology and performs a (full) scan to check the data against the predicate. Which obviously takes an unnecessarily long time to finish on a large table.

Since it is a MAL optimizer that is responsible to evaluate constant expressions, I propose to implement an additional optimizer further in the pipeline that checks for static (anti-)tautologies in the plan and consecutively shortcuts the relevant parts of the script by immediately returning empty results where appropriate.

This is the second time that we at MonetDB Solutions have hit this problem.





Reproducible: Always

Steps to Reproduce:
1.set optimizer = 'sequential_pipe';
2.explain select bar from foo where 0 = 1;
Actual Results:  
X_6:int := sql.mvc();
...MAL statements...
C_7:bat[:oid] := sql.tid(X_6:int, "sys":str, "foo":str);
X_10:bat[:str] := sql.bind(X_6:int, "sys":str, "foo":str, "bar":str, 0:int);
X_19:bat[:str] := algebra.projection(C_7:bat[:oid], X_10:bat[:str]);
X_46:bat[:bit] := algebra.project(X_19:bat[:str], true:bit);
C_47:bat[:oid] := algebra.thetaselect(X_46:bat[:bit], false:bit, "==":str);
...more MAL statements...

Expected Results:  
X_6:int := sql.mvc();
...MAL statements...
C_7:bat[:oid] := sql.tid(X_6:int, "sys":str, "foo":str);
X_10:bat[:str] := sql.bind(X_6:int, "sys":str, "foo":str, "bar":str, 0:int);
X_47:bat[:oid] := bat.new(nil:str);
...more MAL statements...
Comment 1 Niels Nes cwiconfidential 2019-01-23 13:36:31 CET
probably easier to implement a function select( cand, bool) which returns cand or empty cand. And use this in the rel2bin_select lines 2804/5
Comment 2 aris 2019-01-25 09:40:46 CET
I think it sounds good.
Comment 3 Martin Kersten cwiconfidential 2019-07-27 22:51:35 CEST
Constant evaluation is limited to scalar expressions and does not take part at
the SQL optimizer layer (except for trivial cases).
Expanding them to expressions over constant BATs is a major effort.
Furthermore, if the where clause produces an empty candidate list, then
the subsequent projections are cheap anyway.

I would not consider this worth looking into.

We might consider "SELECT * FROM R LIMIT 0" as a better alternative.
(unfortunately LIMIT now requires a positive number)