case performance

Roberto Cornacchia roberto.cornacchia at gmail.com
Mon Oct 7 14:33:26 CEST 2019


Hi Daniel,

[ the Users mailing list is probably more suited for your question ]

The main reason why your first approach is slower is that MonetDB is
column-oriented.
It doesn't evaluate your condition per input tuple, but per input column.

See the EXPLAIN of your example, using one of my existing databases (TABLE=
spinque.info, FIELD=attribute, PATTERN=lock):

|     C_9:bat[:oid] := sql.tid(X_8:int, "spinque":str, "info":str);

                                                            |
|     X_12:bat[:str] := sql.bind(X_8:int, "spinque":str, "info":str,
"attribute":str, 0:int);
                                                                    |
|     (X_17:bat[:oid], X_18:bat[:str]) := sql.bind(X_8:int, "spinque":str,
"info":str, "attribute":str, 2:int);
                                                              |
|     X_15:bat[:str] := sql.bind(X_8:int, "spinque":str, "info":str,
"attribute":str, 1:int);
                                                                    |
|     X_21:bat[:str] := sql.projectdelta(C_9:bat[:oid], X_12:bat[:str],
X_17:bat[:oid], X_18:bat[:str], X_15:bat[:str]);
                                                                 |
|     X_26:bat[:bit] := algebra.project(X_21:bat[:str], false:bit);

                                                            |
Just binding input columns to variables

|     X_32:bat[:bit] := batcalc.ifthenelse(X_26:bat[:bit], false:bit,
true:bit);
Evaluating your first constant condition (user_case_sensitive_pref=1)

|     X_36:bat[:bit] := batalgebra.like(X_21:bat[:str], "%lock%":str);

                                                             |
Evaluating the "then" branch

|     X_40:bat[:bit] := batalgebra.ilike(X_21:bat[:str], "%lock%":str);

                                                            |
Evaluating the "else" branch

|     X_42:bat[:bit] := batcalc.ifthenelse(X_32:bat[:bit], X_36:bat[:bit],
X_40:bat[:bit]);
                                                              |
Evaluating the condition. This produces a boolean column with either the
value from the "then" branch or the value from the "else" branch

|     C_45:bat[:oid] := algebra.thetaselect(X_42:bat[:bit], true:bit,
"==":str);
                                                                   |
Evaluating where, in the original column, the condition is satisfied

|     X_47:bat[:str] := algebra.projection(C_45:bat[:oid], X_21:bat[:str]);

                                                            |
Fetching the values, in the original column, where the condition is
satisfied

|     sql.resultSet(X_59:bat[:str], X_61:bat[:str], X_63:bat[:str],
X_65:bat[:int], X_67:bat[:int], X_47:bat[:str]);
                                                                     |

This is really an overkill for your statement. Firstly, because it always
evaluates fully both the output branches of your condition. Secondly,
because of the overhead needed for this approach.
Without a specialised "contains" UDF, this is how MonetDB handles it in a
generic way. It is the price to pay to get the benefits that column
decomposition brings in other cases.

In this case, you would get better performance by implementing a BAT
function (not a function that works on a single value, but on a column of
values).
You would write that in C using the GDK api (
https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/UserDefinedFunction),
and evaluate the condition i the loop, per value.
However, this requires you to recompile MonetDB to include your new
function.

You could also try the relatively new JIT C UDF mechanism, which allows you
to write your function in C but with a simpler API and without recompiling:
https://www.monetdb.org/blog/JIT_C_C%252B%252B_UDFs_in_MonetDB

Best regards,
Roberto

On Mon, 7 Oct 2019 at 13:17, Daniel Zvinca <daniel.zvinca at logbis.com> wrote:

> Hello,
>
> I am exploring MonetDB queries performance that are generated based on
> user parameters/preferences.
> To be more specific the where clause is generated based on a user
> interface mapping user input parameters straight into SQL statements.
> See below one example of this transformation.
>
> FIELD contains PATTERN (case sensitive/insensitive)
>
> is transformed into
>
> select ... from TABLE where
> (case when (user_case_sensitive_pref=1) then ("FIELD" like '%PATTERN%')
> else  ("FIELD" ilike '%PATTERN%')
> end);
>
> Performance of this in comparison with query generated from within the
> code:
>
> char * query;
> if ( user_case_sensitive_pref )
>   query = "select ... from TABLE where  \"FIELD\" like '%PATTERN%'";
> else
>   query = "select ... from TABLE where  \"FIELD\" ilike '%PATTERN%'";
>
> query_execute( query);
>
> is huge (second approach takes 0.12 sec, vs 6 sec for first one).
> But it requires a far more complex implementation.
>
> First I thought this has to do with user variables but even if the
> statement is
>
> select ... from TABLE where
> (case when TRUE then ("FIELD" like '%PATTERN%')
> else  ("FIELD" ilike '%PATTERN%')
> end);
>
> timing is similarly slow.
>
> This narrows down to "case" performance issue, as far as I can see.
>
> Am I wrong?
>
> Thank you,
> Dan
>
>
>
>
>
>
>
> _______________________________________________
> developers-list mailing list
> developers-list at monetdb.org
> https://www.monetdb.org/mailman/listinfo/developers-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.monetdb.org/pipermail/developers-list/attachments/20191007/40a4b731/attachment.htm>


More information about the developers-list mailing list