Some performance oddities in numerical comparisons on merge tables

Stefan Manegold Stefan.Manegold at cwi.nl
Fri Apr 15 20:31:54 CEST 2016


Hi Marc,

profiling your queries by prefixing them with keyword TRACE
and sharing the resulting profiles would be a prerequisite for
us to analyze /understand what happens in these cases.

Thanks!

Stefan

----- On Apr 15, 2016, at 5:18 PM, Marc Ballarin marc.ballarin at 1und1.de wrote:

> Hi,
> 
> I have a larger *merge table*, of which the relevant columns in this
> query should be:
> "timestamp"         TIMESTAMP
> "abort_flag"         TINYINT
> 
> "timestamp" is roughly ascending, but far from monotonous.
> "abort_flag" has a low cardinality with the following possible values
> and number of occurrences:
>|          0 | 735660 |
>|          1 |  30119 |
>|          2 |   1486 |
>|          4 |     15 |
> 
> Both columns are completely populated.
> 
> Total row numbers are:
> merge table: 3,773,559,881
> current partition: 30,754,353
> 
> I get the the following performance numbers:
> sql>select count(*) from mytable where timestamp between timestamp
> '2016-04-15 13:05:05' and timestamp '2016-04-15 13:20:05' and abort_flag
> in (0);
> +--------+
>| L1     |
> +========+
>| 735660 |
> +--------+
> 1 tuple (1.0s)
> 
> sql>select count(*) from mytable where timestamp between timestamp
> '2016-04-15 13:05:05' and timestamp '2016-04-15 13:20:05' and abort_flag
> = 0;
> +--------+
>| L1     |
> +========+
>| 735660 |
> +--------+
> 1 tuple (34.0s)
> 
> So there is a very significant, and IMHO surprising, performance
> difference between the "equals" and the "in" comparison.
> 
> Doing the same query directly against the sub table has a much more
> consistent performance:
> sql>select count(*) from mytable_20160415 where timestamp between
> timestamp '2016-04-15 13:05:05' and timestamp '2016-04-15 13:20:05' and
> abort_flag IN (0);
> +--------+
>| L1     |
> +========+
>| 735660 |
> +--------+
> 1 tuple (618.914ms)
> 
> sql>select count(*) from mytable_20160415 where timestamp between
> timestamp '2016-04-15 13:05:05' and timestamp '2016-04-15 13:20:05' and
> abort_flag = 0;
> +--------+
>| L1     |
> +========+
>| 735660 |
> +--------+
> 1 tuple (1.1s)
> 
> The same can be observed with another column, where ">= 4" is much
> slower than the (in my case) equivalent "IN (4,5,6)".
> 
> I guess, for my uses I can always switch to the "in" conditions, because
> all relevant columns should have very low cardinality, but in general,
> this seems odd.
> 
> Is there an explanation for this behaviour, like optimizer problems with
> merge tables? Or maybe a way to avoid it?
> 
> Regards,
> Marc
> _______________________________________________
> users-list mailing list
> users-list at monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list

-- 
| Stefan.Manegold at CWI.nl | DB Architectures   (DA) |
| www.CWI.nl/~manegold/  | Science Park 123 (L321) |
| +31 (0)20 592-4212     | 1098 XG Amsterdam  (NL) |


More information about the users-list mailing list