Bug 2684 - avoid pcre.like_uselect when no modifiers are present
Summary: avoid pcre.like_uselect when no modifiers are present
Status: RESOLVED INVALID
Alias: None
Product: SQL
Classification: Unclassified
Component: all (show other bugs)
Version: 2.40.1 (Oct2010) [obsolete]
Hardware: Other Linux
: Normal enhancement
Assignee: SQL devs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-10-06 13:25 CEST by Stefan de Konink
Modified: 2010-12-10 15:29 CET (History)
0 users



Attachments
My output (4.22 KB, text/plain)
2010-12-10 14:39 CET, Stefan de Konink
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan de Konink 2010-10-06 13:25:00 CEST
User-Agent:       Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.24 Safari/534.7
Build Identifier: 

Even for the query select * from tables where name like 'tables'; pcre.like_uselect is used. I wonder if it is better to solve modifier 'free' cases in SQL or directly in the PCRE module. Hence: column like 'argument' should become column = 'argument'.

Reproducible: Always
Comment 1 Fabian Groffen 2010-10-06 15:11:07 CEST
this is a regression, because this used to fallback to uselect when no modifiers were in use
Comment 2 Fabian Groffen 2010-10-28 15:40:29 CEST
The implementation of pcre.like_uselect calls a normal uselect if no patterns are used.  You can't see this in the explain (MAL) code, because this is a runtime optimisation on the C level.

Conclusion, we already avoid calling pcre matching functions for simple like queries.
Comment 3 Stefan de Konink 2010-10-28 15:45:34 CEST
Not so fast.

In your reasoning it should have the same performance and not being 300x slower.

sql>select kvk,bedrijfsnaam from kvk where bedrijfsnaam like 'MonetDB B.V.';
+--------------+--------------+
| kvk          | bedrijfsnaam |
+==============+==============+
| 321351760000 | MonetDB B.V. |
| 321351760000 | MonetDB B.V. |
+--------------+--------------+
2 tuples (676.854ms)
Timer     676.915 msec 2 rows
sql>select kvk,bedrijfsnaam from kvk where bedrijfsnaam = 'MonetDB B.V.';
+--------------+--------------+
| kvk          | bedrijfsnaam |
+==============+==============+
| 321351760000 | MonetDB B.V. |
| 321351760000 | MonetDB B.V. |
+--------------+--------------+
2 tuples (2.718ms)
Comment 4 Fabian Groffen 2010-12-10 14:33:07 CET
sql>select count(*) from sensor_readings;
+----------+
| L7       |
+==========+
| 71323546 |
+----------+
1 tuple (59.000ms)
sql>select count(*) from sensor_readings where location like '192.16.196.148';
+------+
| L10  |
+======+
|    0 |
+------+
1 tuple (1.5s)
sql>select count(*) from sensor_readings where location = '192.16.196.148';
+------+
| L11  |
+======+
|    0 |
+------+
1 tuple (2.2s)


Are you sure you aren't looking at cold/warm database effects?
Comment 5 Fabian Groffen 2010-12-10 14:37:26 CET
additional demonstration that the performance differs per run, but somehow 'like' seems to faster in general than '=':

sql>select count(*) from sensor_readings where location like 'L318';
+----------+
| L4       |
+==========+
| 13383466 |
+----------+
1 tuple (2.3s)
sql>select count(*) from sensor_readings where location = 'L318';
+----------+
| L5       |
+==========+
| 13383474 |
+----------+
1 tuple (1.8s)
sql>select count(*) from sensor_readings where location like 'L318';
+----------+
| L4       |
+==========+
| 13383484 |
+----------+
1 tuple (830.000ms)
sql>select count(*) from sensor_readings where location = 'L318';
+----------+
| L5       |
+==========+
| 13383492 |
+----------+
1 tuple (1.8s)
sql>select count(*) from sensor_readings where location like 'L318';
+----------+
| L4       |
+==========+
| 13383496 |
+----------+
1 tuple (0.9s)
sql>


(yes the results differ, since the database is continuously updated with new sensor readings, also from this room I'm querying for)
Comment 6 Stefan de Konink 2010-12-10 14:39:08 CET
I see something totally different.
Comment 7 Stefan de Konink 2010-12-10 14:39:34 CET
Created attachment 47 [details]
My output
Comment 8 Fabian Groffen 2010-12-10 14:44:33 CET
Can you run explain please?  like is being ran with multiple cores, while the = is not.
Comment 9 Fabian Groffen 2010-12-10 14:50:20 CET
This is not so much a bug.

If you don't want mitosis to kick in (exploit parallelism) because you know it never is going to make anything for you faster, you best just disable it by removing it from the optimiser pipeline you have in use for your databases.
Comment 10 Stefan de Konink 2010-12-10 15:14:20 CET
(In reply to comment #8)
> Can you run explain please?  like is being ran with multiple cores, while the =
> is not.

sql>explain select kvk,bedrijfsnaam from kvk where bedrijfsnaam like 'MonetDB B.V.';
% .explain # table_name
% mal # name
% clob # type
% 0 # length
function user.s2_1{autoCommit=true}():void;
    _2 := sql.mvc();
barrier _93 := language.dataflow();
    _46:bat[:oid,:str]  := sql.bind(_2,"sys","kvk","bedrijfsnaam",0,0@0:oid,1310385@0);
    _7:bat[:oid,:str]  := sql.bind(_2,"sys","kvk","bedrijfsnaam",2);
    _51 := algebra.kdifference(_46,_7);
    _53 := algebra.semijoin(_7,_46);
    _55 := algebra.kunion(_51,_53);
    _11:bat[:oid,:oid]  := sql.bind_dbat(_2,"sys","kvk",1);
    _12 := bat.reverse(_11);
    _57 := algebra.kdifference(_55,_12);
    _58 := batcalc.str(_57);
    _61 := pcre.like_uselect(_58,"MonetDB B.V.":str,"":str);
    _64 := algebra.markT(_61,3,0);
    _76 := bat.reverse(_64);
    _49:bat[:oid,:lng]  := sql.bind(_2,"sys","kvk","kvk",0,0@0:oid,1310385@0);
    _13:bat[:oid,:lng]  := sql.bind(_2,"sys","kvk","kvk",2);
    _81 := algebra.kdifference(_49,_13);
    _83 := algebra.semijoin(_13,_49);
    _85 := algebra.kunion(_81,_83);
    _87 := algebra.leftjoin(_76,_85);
    _48:bat[:oid,:str]  := sql.bind(_2,"sys","kvk","bedrijfsnaam",0,1310385@0,nil:oid);
    _52 := algebra.kdifference(_48,_7);
    _54 := algebra.semijoin(_7,_48);
    _56 := algebra.kunion(_52,_54);
    _59 := batcalc.str(_56);
    _62 := pcre.like_uselect(_59,"MonetDB B.V.":str,"":str);
    _68 := algebra.markT(_62,3,1);
    _79 := bat.reverse(_68);
    _50:bat[:oid,:lng]  := sql.bind(_2,"sys","kvk","kvk",0,1310385@0,nil:oid);
    _82 := algebra.kdifference(_50,_13);
    _84 := algebra.semijoin(_13,_50);
    _86 := algebra.kunion(_82,_84);
    _88 := algebra.leftjoin(_79,_86);

    _9:bat[:oid,:str]  := sql.bind(_2,"sys","kvk","bedrijfsnaam",1);
    _60 := batcalc.str(_9);
    _63 := pcre.like_uselect(_60,"MonetDB B.V.":str,"":str);
    _71 := algebra.markT(_63,3,2);
    _80 := bat.reverse(_71);
    _14:bat[:oid,:lng]  := sql.bind(_2,"sys","kvk","kvk",1);
    _89 := algebra.leftjoin(_80,_14);
    _15 := mat.pack(_87,_88,_89);
    _92 := algebra.leftjoin(_80,_9);
    _91 := algebra.leftjoin(_79,_56);
    _90 := algebra.leftjoin(_76,_55);
exit _93;
    _17 := sql.resultSet(2,1,_15);
    sql.rsColumn(_17,"sys.kvk","kvk","bigint",64,0,_15);
    _16 := mat.pack(_90,_91,_92);
    sql.rsColumn(_17,"sys.kvk","bedrijfsnaam","varchar",256,0,_16);
    _28 := io.stdout();
    sql.exportResult(_28,_17);
end s2_1;


sql>explain select kvk,bedrijfsnaam from kvk where bedrijfsnaam = 'MonetDB B.V.';
% .explain # table_name
% mal # name
% clob # type
% 0 # length
function user.s3_1{autoCommit=true}():void;
    _2 := sql.mvc();
    _3:bat[:oid,:str]  := sql.bind(_2,"sys","kvk","bedrijfsnaam",0);
    _8 := algebra.uselect(_3,"MonetDB B.V.":str);
    _9:bat[:oid,:str]  := sql.bind(_2,"sys","kvk","bedrijfsnaam",2);
    _11 := algebra.kdifference(_8,_9);
    _12 := algebra.uselect(_9,"MonetDB B.V.":str);
    _13 := algebra.kunion(_11,_12);
    _14:bat[:oid,:str]  := sql.bind(_2,"sys","kvk","bedrijfsnaam",1);
    _16 := algebra.uselect(_14,"MonetDB B.V.":str);
    _17 := algebra.kunion(_13,_16);
    _18:bat[:oid,:oid]  := sql.bind_dbat(_2,"sys","kvk",1);
    _19 := bat.reverse(_18);
    _20 := algebra.kdifference(_17,_19);
    _21 := algebra.markT(_20,0@0:oid);
    _22 := bat.reverse(_21);
    _23:bat[:oid,:lng]  := sql.bind(_2,"sys","kvk","kvk",0);
    _24:bat[:oid,:lng]  := sql.bind(_2,"sys","kvk","kvk",2);
    _25 := algebra.kdifference(_23,_24);
    _26 := algebra.kunion(_25,_24);
    _27:bat[:oid,:lng]  := sql.bind(_2,"sys","kvk","kvk",1);
    _28 := algebra.kunion(_26,_27);
    _29 := algebra.leftjoin(_22,_28);
    _30 := algebra.kdifference(_3,_9);
    _31 := algebra.kunion(_30,_9);
    _32 := algebra.kunion(_31,_14);
    _33 := algebra.leftjoin(_22,_32);
    _34 := sql.resultSet(2,1,_29);
    sql.rsColumn(_34,"sys.kvk","kvk","bigint",64,0,_29);
    sql.rsColumn(_34,"sys.kvk","bedrijfsnaam","varchar",256,0,_33);
    _45 := io.stdout();
    sql.exportResult(_45,_34);
end s3_1;

Mitosis off:

sql>explain select kvk,bedrijfsnaam from kvk where bedrijfsnaam like 'MonetDB B.V.';
% .explain # table_name
% mal # name
% clob # type
% 0 # length
function user.s1_1{autoCommit=true}():void;
    _2 := sql.mvc();
barrier _59 := language.dataflow();
    _3:bat[:oid,:str]  := sql.bind(_2,"sys","kvk","bedrijfsnaam",0);
    _8:bat[:oid,:str]  := sql.bind(_2,"sys","kvk","bedrijfsnaam",2);
    _10 := algebra.kdifference(_3,_8);
    _11 := algebra.kunion(_10,_8);
    _12:bat[:oid,:str]  := sql.bind(_2,"sys","kvk","bedrijfsnaam",1);
    _14 := algebra.kunion(_11,_12);
    _15:bat[:oid,:oid]  := sql.bind_dbat(_2,"sys","kvk",1);
    _16 := bat.reverse(_15);
    _17 := algebra.kdifference(_14,_16);
    _18 := batcalc.str(_17);
    _19 := pcre.like_uselect(_18,"MonetDB B.V.":str,"":str);
    _20 := algebra.markT(_19,0@0:oid);
    _21 := bat.reverse(_20);
    _22:bat[:oid,:lng]  := sql.bind(_2,"sys","kvk","kvk",0);
    _23:bat[:oid,:lng]  := sql.bind(_2,"sys","kvk","kvk",2);
    _24 := algebra.kdifference(_22,_23);
    _25 := algebra.kunion(_24,_23);
    _26:bat[:oid,:lng]  := sql.bind(_2,"sys","kvk","kvk",1);
    _27 := algebra.kunion(_25,_26);
    _28 := algebra.leftjoin(_21,_27);
    _29 := algebra.leftjoin(_21,_14);
exit _59;
    _30 := sql.resultSet(2,1,_28);
    sql.rsColumn(_30,"sys.kvk","kvk","bigint",64,0,_28);
    sql.rsColumn(_30,"sys.kvk","bedrijfsnaam","varchar",256,0,_29);
    _41 := io.stdout();
    sql.exportResult(_41,_30);
end s1_1;


sql>select kvk,bedrijfsnaam from kvk where bedrijfsnaam like 'MonetDB B.V.';
+--------------+--------------+
| kvk          | bedrijfsnaam |
+==============+==============+
| 321351760000 | MonetDB B.V. |
| 321351760000 | MonetDB B.V. |
+--------------+--------------+
2 tuples (1.8s)
sql>select kvk,bedrijfsnaam from kvk where bedrijfsnaam like 'MonetDB B.V.';
+--------------+--------------+
| kvk          | bedrijfsnaam |
+==============+==============+
| 321351760000 | MonetDB B.V. |
| 321351760000 | MonetDB B.V. |
+--------------+--------------+
2 tuples (681.097ms)
sql>select kvk,bedrijfsnaam from kvk where bedrijfsnaam like 'MonetDB B.V.';
+--------------+--------------+
| kvk          | bedrijfsnaam |
+==============+==============+
| 321351760000 | MonetDB B.V. |
| 321351760000 | MonetDB B.V. |
+--------------+--------------+
2 tuples (676.295ms)
sql>select kvk,bedrijfsnaam from kvk where bedrijfsnaam like 'MonetDB B.V.';
+--------------+--------------+
| kvk          | bedrijfsnaam |
+==============+==============+
| 321351760000 | MonetDB B.V. |
| 321351760000 | MonetDB B.V. |
+--------------+--------------+
2 tuples (675.428ms)

sql>select kvk,bedrijfsnaam from kvk where bedrijfsnaam = 'MonetDB B.V.';
+--------------+--------------+
| kvk          | bedrijfsnaam |
+==============+==============+
| 321351760000 | MonetDB B.V. |
| 321351760000 | MonetDB B.V. |
+--------------+--------------+
2 tuples (2.437ms)
sql>select kvk,bedrijfsnaam from kvk where bedrijfsnaam = 'MonetDB B.V.';
+--------------+--------------+
| kvk          | bedrijfsnaam |
+==============+==============+
| 321351760000 | MonetDB B.V. |
| 321351760000 | MonetDB B.V. |
+--------------+--------------+
2 tuples (0.754ms)
sql>select kvk,bedrijfsnaam from kvk where bedrijfsnaam = 'MonetDB B.V.';
+--------------+--------------+
| kvk          | bedrijfsnaam |
+==============+==============+
| 321351760000 | MonetDB B.V. |
| 321351760000 | MonetDB B.V. |
+--------------+--------------+
2 tuples (0.724ms)
sql>select kvk,bedrijfsnaam from kvk where bedrijfsnaam = 'MonetDB B.V.';
+--------------+--------------+
| kvk          | bedrijfsnaam |
+==============+==============+
| 321351760000 | MonetDB B.V. |
| 321351760000 | MonetDB B.V. |
+--------------+--------------+
2 tuples (0.764ms)

Clearly Fabian this is not the cause, Mitosis of performs even worse. And we are talking about big factors. Again: we should rewrite like without modifiers, but given the performance drain, there might be more issues anyway.
Comment 11 Fabian Groffen 2010-12-10 15:29:03 CET
File a new performance request.  like is not using pcre when no modifier are present.  The batcalc.str() is looking weird to me though, and probably causes lots of time to be spent.  Please file a new bug, attaching the plan and trace outputs.