Some performance oddities in numerical comparisons on merge tables

Marc Ballarin marc.ballarin at 1und1.de
Fri Apr 15 17:18:27 CEST 2016


Hi,

I have a larger *merge table*, of which the relevant columns in this
query should be:
"timestamp"         TIMESTAMP
"abort_flag"         TINYINT

"timestamp" is roughly ascending, but far from monotonous.
"abort_flag" has a low cardinality with the following possible values
and number of occurrences:
|          0 | 735660 |
|          1 |  30119 |
|          2 |   1486 |
|          4 |     15 |

Both columns are completely populated.

Total row numbers are:
merge table: 3,773,559,881
current partition: 30,754,353

I get the the following performance numbers:
sql>select count(*) from mytable where timestamp between timestamp
'2016-04-15 13:05:05' and timestamp '2016-04-15 13:20:05' and abort_flag
in (0);   
+--------+
| L1     |
+========+
| 735660 |
+--------+
1 tuple (1.0s)

sql>select count(*) from mytable where timestamp between timestamp
'2016-04-15 13:05:05' and timestamp '2016-04-15 13:20:05' and abort_flag
= 0;
+--------+
| L1     |
+========+
| 735660 |
+--------+
1 tuple (34.0s)

So there is a very significant, and IMHO surprising, performance
difference between the "equals" and the "in" comparison.

Doing the same query directly against the sub table has a much more
consistent performance:
sql>select count(*) from mytable_20160415 where timestamp between
timestamp '2016-04-15 13:05:05' and timestamp '2016-04-15 13:20:05' and
abort_flag IN (0);
+--------+
| L1     |
+========+
| 735660 |
+--------+
1 tuple (618.914ms)

sql>select count(*) from mytable_20160415 where timestamp between
timestamp '2016-04-15 13:05:05' and timestamp '2016-04-15 13:20:05' and
abort_flag = 0;
+--------+
| L1     |
+========+
| 735660 |
+--------+
1 tuple (1.1s)

The same can be observed with another column, where ">= 4" is much
slower than the (in my case) equivalent "IN (4,5,6)".

I guess, for my uses I can always switch to the "in" conditions, because
all relevant columns should have very low cardinality, but in general,
this seems odd.

Is there an explanation for this behaviour, like optimizer problems with
merge tables? Or maybe a way to avoid it?

Regards,
Marc


More information about the users-list mailing list