Bug 4022

Summary: Preventing CAST( AS VARCHAR) from TEXT within WHERE part
Product: SQL Reporter: Stefan de Konink <stefan>
Component: allAssignee: SQL devs <bugs-sql>
Status: NEW ---    
Severity: enhancement CC: martin.van.dinther, webhostinguk
Priority: Normal    
Version: 11.21.19 (Jul2015-SP4)   
Hardware: Other   
OS: Linux   

Description Stefan de Konink 2016-06-13 11:15:38 CEST
User-Agent:       Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/49.0.2623.108 Chrome/49.0.2623.108 Safari/537.36
Build Identifier: 

The query generator in Tableau Desktop 9.3 is generating queries in this template.

SELECT AVG(CAST("punctdep"."punctuality" AS FLOAT)) AS "avg:punctuality:ok",
  CAST("punctdep"."classificatie" AS VARCHAR(100)) AS "classificatie",
  COUNT("punctdep"."punctuality") AS "cnt:punctuality:ok",
  CAST("punctdep"."lineplanningnumber" AS VARCHAR(100)) AS "lineplanningnumber",
  CAST("punctdep"."name" AS VARCHAR(100)) AS "name",
  "punctdep"."pointorder" AS "pointorder",
  SUM("punctdep"."punctuality") AS "sum:punctuality:ok"
FROM "sys"."punctdep" "punctdep"
WHERE (((CAST("punctdep"."classificatie" AS VARCHAR(100)) IN ('t < -60', 't > 300')) OR ((CAST("punctdep"."classificatie" AS VARCHAR(100)) >= '-60 <= t < 0') AND (CAST("punctdep"."classificatie" AS VARCHAR(100)) <= '60 < t <= 120'))) AND (CAST("punctdep"."journeynumber" AS VARCHAR(100)) >= '1') AND (CAST("punctdep"."journeynumber" AS VARCHAR(100)) <= '9997') AND (CAST("punctdep"."lineplanningnumber" AS VARCHAR(100)) = '4') AND ("punctdep"."operatingday" = DATE'2016-03-08') AND (CAST("punctdep"."richting" AS VARCHAR(100)) = 'Zoetermeer, Javalaan'))
GROUP BY "classificatie",
  "lineplanningnumber",
  "name",
  "pointorder";

Schema definition:

CREATE MERGE TABLE "sys"."punctdep" (
	"operatingday"       DATE,
	"lineplanningnumber" CHARACTER LARGE OBJECT,
	"journeynumber"      CHARACTER LARGE OBJECT,
	"pointorder"         SMALLINT,
	"userstopcode"       CHARACTER LARGE OBJECT,
	"punctuality"        INTEGER,
	"name"               CHARACTER LARGE OBJECT,
	"vanaf"              CHARACTER LARGE OBJECT,
	"richting"           CHARACTER LARGE OBJECT,
	"vehiclenumber"      INTEGER,
	"ltmin61"            TINYINT,
	"min60"              TINYINT,
	"pos60"              TINYINT,
	"pos120"             TINYINT,
	"pos180"             TINYINT,
	"pos300"             TINYINT,
	"gt300"              TINYINT,
	"classificatie"      CHARACTER LARGE OBJECT,
	"dow"                TINYINT,
	"hod"                TINYINT
);

Optimisations that are possible but not used in MonetDB:

Classificatie is a column which has just a few distinct values. The distinct values are selected inclusively while the user actually wants to prevent 'Onbekend' to be selected. Would it be reasonable for a database system to do a faster negation?


Stupid query generation from the Client:

Columns are recasted. This results in MonetDB performance to decrease ten fold. I wonder: how reasonable would it be to make a cast to varchar a no-op when applied to a text (character large object) where the proposed varchar constraint is larger than the (const) equality constraint hence: the column is not shrunken to facilitate equality. An other suggestion might be to optimise:


cast(bla to varchar(2)) = 'hello world' => 'hello world' = 11, 11 > 2, op to false;
cast(bla to varchar(12)) = 'hello world' => 'hello world' = 11, 11 >= 12, strncmp('hello world', bla, max(12, strlen('hello world'))

Reproducible: Always




MonetDB 5 server v11.24.0 (64-bit, 64-bit oids, 128-bit integers)
This is an unreleased version
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2016 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 62.8GiB available memory, 8 available cpu cores
Libraries:
  libpcre: 8.38 2015-11-23 (compiled with 8.38)
  openssl: OpenSSL 1.0.2h  3 May 2016 (compiled with OpenSSL 1.0.2h  3 May 2016)
  libxml2: 2.9.4 (compiled with 2.9.4)
Compiled by: skinkie@chamechaude (x86_64-pc-linux-gnu)
Compilation: gcc -O3  -pipe    -Werror -Wall -Wextra -W -Werror-implicit-function-declaration -Wpointer-arith -Wdeclaration-after-statement -Wundef -Wformat=2 -Wno-format-nonliteral -Winit-self -Winvalid-pch -Wmissing-declarations -Wmissing-format-attribute -Wmissing-prototypes -Wold-style-definition -Wpacked -Wunknown-pragmas -Wvariadic-macros -fstack-protector-all -Wstack-protector -Wpacked-bitfield-compat -Wsync-nand -Wjump-misses-init -Wmissing-include-dirs -Wlogical-op -Wunreachable-code
Linking    : /usr/x86_64-pc-linux-gnu/bin/ld -m elf_x86_64
Comment 1 Martin van Dinther cwiconfidential 2016-06-23 18:48:51 CEST
I guess that Tableau is generating the CAST(... AS VARCHAR(100)) parts as most RDBMS do not allow complex operations on CLOBs, so convert it to VARCHAR first.
MonetDB however does allow to use all string operations even on CLOB columns.

To 'help' Tableau to not generate the CAST(... AS VARCHAR(100)) parts you could create a view (or copy the table or recreate the table) to have all CLOB columns to be of type VARCHAR(100) instead.

CREATE VIEW "sys"."punctdep_noCLOB" AS
SELECT "operatingday", CAST("lineplanningnumber" AS VARCHAR(100)) AS "lineplanningnumber", CAST("journeynumber" AS VARCHAR(100)) AS "journeynumber", "pointorder", CAST("userstopcode" AS VARCHAR(100)) AS "userstopcode", "punctuality", CAST("name" AS VARCHAR(100)) AS "name", CAST("vanaf" AS VARCHAR(100)) AS "vanaf", CAST("richting" AS VARCHAR(100)) AS "richting", "vehiclenumber", "ltmin61", "min60", "pos60", "pos120", "pos180", "pos300", "gt300", CAST("classificatie" AS VARCHAR(100)) AS "classificatie", "dow", "hod" FROM "sys"."punctdep";

Please try and let us know if it helps.

This would be a workaround, not an implementation of the enhancement of course.
Comment 2 Martin van Dinther cwiconfidential 2018-11-20 19:21:33 CET
Hi Stefan,

Did the workaround with the view work?
If not, you should avoid using CHARACTER LARGE OBJECT in the CREATE MERGE TABLE and use VARCHAR(999) instead (or VARCHAR(9999) depending on the content of the column data), for example:
CREATE MERGE TABLE "sys"."punctdep" (
	"operatingday"       DATE,
	"lineplanningnumber" VARCHAR(999),
	"journeynumber"      VARCHAR(999),
	"pointorder"         SMALLINT,
	"userstopcode"       VARCHAR(999),
	"punctuality"        INTEGER,
	"name"               VARCHAR(999),
	"vanaf"              VARCHAR(999),
	"richting"           VARCHAR(999),
	"vehiclenumber"      INTEGER,
	"ltmin61"            TINYINT,
	"min60"              TINYINT,
	"pos60"              TINYINT,
	"pos120"             TINYINT,
	"pos180"             TINYINT,
	"pos300"             TINYINT,
	"gt300"              TINYINT,
	"classificatie"      VARCHAR(999),
	"dow"                TINYINT,
	"hod"                TINYINT
);
This way Tableau knows it can use: "classificatie" IN ('case1', 'case2') without the need to add CAST("classificatie" AS VARCHAR(100)) functions as the "classificatie" column is already a VARCHAR.
This should speed up your queries.
Comment 3 Stefan de Konink 2018-11-20 20:13:15 CET
I was sadly not able to test the workaround.