Bug 2500

Summary: SQL: Performance slowdown on or
Product: SQL Reporter: Stefan de Konink <skinkie>
Component: allAssignee: SQL devs <bugs-sql>
Status: NEW ---    
Severity: enhancement    
Priority: ---    
Version: -- development   
Hardware: x86 (i?86)   
OS: Linux   
URL: http://sourceforge.net/support/tracker.php?aid=2816059

Description Stefan de Konink 2009-07-03 02:48:10 CEST
I received a take down notice of the KVK, until that is resolved I have to keep this academic...

With a new data provider aka real partner without legal battles; new bugs arise:

CREATE TABLE "sys"."faillissementen" (
        "kvk" bigint,
        "bedrijfsnaam" varchar(255),
        "plaats" varchar(32),
        "rechtbank" varchar(64),
        "rolnummer" varchar(16),
        "status" varchar(16),
        "datum" date
);

CREATE TABLE "sys"."kvk" (
        "kvk" bigint,
        "bedrijfsnaam" varchar(255),
        "adres" varchar(64),
        "postcode" varchar(6),
        "plaats" varchar(32),
        "type" varchar(16)
);

select * from faillissementen, kvk where faillissementen.kvk = kvk.kvk;
0 tuples
Timer      25.549 msec 0 rows

sql>select * from faillissementen, kvk where (faillissementen.kvk*10000) = kvk.kvk;
22 tuples
Timer      44.483 msec 22 rows

So far fine; now... for some reason nobody really cares to normalise their numbers so I want to join the two.
select * from faillissementen, kvk where (faillissementen.kvk*10000) = kvk.kvk or faillissementen.kvk = kvk.kvk;

Query gets totally out of bounds.

Now you may ask; so what if you do it like first year students?
select * from faillissementen, kvk where (faillissementen.kvk*10000) = kvk.kvk union select * from faillissementen, kvk where faillissementen.kvk = kvk.kvk;
20 tuples
Timer      57.804 msec 20 rows

Now you see that there are 20 results instead of 22. Basically what misses are the multiple results of the kvk side matching the number; The most trivial example:

sql>select kvk, adres from kvk where kvk = 10926910000;
+-------------+-------------------------+
| kvk         | adres                   |
+=============+=========================+
| 10926910000 | Anthonie Fokkerstraat 5 |
| 10926910000 | Anthonie Fokkerstraat 5 |
+-------------+-------------------------+

sql>select kvk.kvk, kvk.adres from faillissementen, kvk where (faillissementen.kvk*10000) = kvk.kvk union select kvk.kvk, kvk.adres from faillissementen, kvk where faillissementen.kvk = kvk.kvk;
+--------------+---------------------------+
| kvk          | adres                     |
+==============+===========================+
|  10926910000 | Anthonie Fokkerstraat 5   |
|  60680140000 | Spanjaardslaan 25 -1      |

sql>select kvk.kvk, kvk.adres from faillissementen, kvk where (faillissementen.kvk*10000) = kvk.kvk;
+--------------+---------------------------+
| kvk          | adres                     |
+==============+===========================+
|  10926910000 | Anthonie Fokkerstraat 5   |
|  10926910000 | Anthonie Fokkerstraat 5   |
Comment 1 Stefan de Konink 2009-07-03 13:40:29 CEST
As Niels suggested, union all solves the last part; didn't know union had a build in distinct.
Comment 2 Martin Kersten cwiconfidential 2009-08-06 15:45:09 CEST
stacked for possible future improvements of the query optimizer 
Comment 3 Stefan Manegold cwiconfidential 2009-08-14 13:58:23 CEST
tagged subject
Comment 4 Pseudo user for Sourceforge import 2010-05-04 09:32:09 CEST
This bug was previously known as tracker item 2816059 at http://sourceforge.net/support/tracker.php?aid=2816059