Hi Daniel,

[ the Users mailing list is probably more suited for your question ]

The main reason why your first approach is slower is that MonetDB is column-oriented.

It doesn't evaluate your condition per input tuple, but per input column.

See the EXPLAIN of your example, using one of my existing databases (TABLE=spinque.info, FIELD=attribute, PATTERN=lock):

| C_9:bat[:oid] := sql.tid(X_8:int, "spinque":str, "info":str); |
| X_12:bat[:str] := sql.bind(X_8:int, "spinque":str, "info":str, "attribute":str, 0:int); |
| (X_17:bat[:oid], X_18:bat[:str]) := sql.bind(X_8:int, "spinque":str, "info":str, "attribute":str, 2:int); |
| X_15:bat[:str] := sql.bind(X_8:int, "spinque":str, "info":str, "attribute":str, 1:int); |
| X_21:bat[:str] := sql.projectdelta(C_9:bat[:oid], X_12:bat[:str], X_17:bat[:oid], X_18:bat[:str], X_15:bat[:str]); |
| X_26:bat[:bit] := algebra.project(X_21:bat[:str], false:bit); |
Just binding input columns to variables

| X_32:bat[:bit] := batcalc.ifthenelse(X_26:bat[:bit], false:bit, true:bit);

Evaluating your first constant condition (user_case_sensitive_pref=1)

| X_36:bat[:bit] := batalgebra.like(X_21:bat[:str], "%lock%":str); |

Evaluating the "then" branch

| X_40:bat[:bit] := batalgebra.ilike(X_21:bat[:str], "%lock%":str); |
Evaluating the "else" branch

| X_42:bat[:bit] := batcalc.ifthenelse(X_32:bat[:bit], X_36:bat[:bit], X_40:bat[:bit]); |
Evaluating the condition. This produces a boolean column with either the value from the "then" branch or the value from the "else" branch

| C_45:bat[:oid] := algebra.thetaselect(X_42:bat[:bit], true:bit, "==":str); |
Evaluating where, in the original column, the condition is satisfied

| X_47:bat[:str] := algebra.projection(C_45:bat[:oid], X_21:bat[:str]); |

Fetching the values, in the original column, where the condition is satisfied

| sql.resultSet(X_59:bat[:str], X_61:bat[:str], X_63:bat[:str], X_65:bat[:int], X_67:bat[:int], X_47:bat[:str]); |

This is really an overkill for your statement. Firstly, because it always evaluates fully both the output branches of your condition. Secondly, because of the overhead needed for this approach.

Without a specialised "contains" UDF, this is how MonetDB handles it in a generic way. It is the price to pay to get the benefits that column decomposition brings in other cases.

In this case, you would get better performance by implementing a BAT function (not a function that works on a single value, but on a column of values).

You would write that in C using the GDK api (https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/UserDefinedFunction), and evaluate the condition i the loop, per value.

However, this requires you to recompile MonetDB to include your new function.

You could also try the relatively new JIT C UDF mechanism, which allows you to write your function in C but with a simpler API and without recompiling: https://www.monetdb.org/blog/JIT_C_C%252B%252B_UDFs_in_MonetDB

Best regards,

Roberto

On Mon, 7 Oct 2019 at 13:17, Daniel Zvinca <daniel.zvinca@logbis.com> wrote:

Hello,

I am exploring MonetDB queries performance that are generated based on user parameters/preferences.
To be more specific the where clause is generated based on a user interface mapping user input parameters straight into SQL statements.
See below one example of this transformation.

FIELD contains PATTERN (case sensitive/insensitive)

is transformed into

select ... from TABLE where
(case when (user_case_sensitive_pref=1) then ("FIELD" like '%PATTERN%')
else ("FIELD" ilike '%PATTERN%')
end);

Performance of this in comparison with query generated from within the code:

char * query;
if ( user_case_sensitive_pref )
query = "select ... from TABLE where \"FIELD\" like '%PATTERN%'";
else
query = "select ... from TABLE where \"FIELD\" ilike '%PATTERN%'";

query_execute( query);

is huge (second approach takes 0.12 sec, vs 6 sec for first one).
But it requires a far more complex implementation.

First I thought this has to do with user variables but even if the statement is

select ... from TABLE where
(case when TRUE then ("FIELD" like '%PATTERN%')
else ("FIELD" ilike '%PATTERN%')
end);

timing is similarly slow.

This narrows down to "case" performance issue, as far as I can see.

Am I wrong?

Thank you,
Dan

_______________________________________________
developers-list mailing list
developers-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/developers-list