Re: 回复:Re: Some performance oddities in numerical comparisons on merge tables

Martin Kersten martin.kersten at cwi.nl
Sun Apr 17 18:05:46 CEST 2016


On 17/04/16 17:10, chenkaijiang at bigknow.com.cn wrote:
> I understand that if MonetDB uses the timestamp as a filter (more selective), then it is faster. But does the MonetDB get this speed by purely sequence scan on timestamp column with 3,773,559,881 rows, or does it use some technique such as binary search?
>
If the underlying column happens to be sorted then, yes MonetDB would use binary search.
Have a look at 'select * from storage()' to see if the column is indeed sorted
>
>  > Total row numbers are:
>  > merge table: 3,773,559,881
>  > current partition: 30,754,353
>  >
>  > I get the the following performance numbers:
>  > sql>select count(*) from mytable where timestamp between timestamp
>  > '2016-04-15 13:05:05' and timestamp '2016-04-15 13:20:05' and abort_flag
>  > in (0);
>  > +--------+
>  > | L1 |
>  > +========+
>  > | 735660 |
>  > +--------+
>  > 1 tuple (1.0s)
>
> ----- 原始邮件 -----
> 发件人:Niels Nes <Niels.Nes at cwi.nl>
> 收件人:Communication channel for MonetDB users <users-list at monetdb.org>
> 主题:Re: Some performance oddities in numerical comparisons on merge tables
> 日期:2016-4-17 18:31:11
>
> On Fri, Apr 15, 2016 at 05:18:27PM +0200, Marc Ballarin wrote:
>  > Hi,
>  >
>  > I have a larger *merge table*, of which the relevant columns in this
>  > query should be:
>  > "timestamp" TIMESTAMP
>  > "abort_flag" TINYINT
>  >
>  > "timestamp" is roughly ascending, but far from monotonous.
>  > "abort_flag" has a low cardinality with the following possible values
>  > and number of occurrences:
>  > | 0 | 735660 |
>  > | 1 | 30119 |
>  > | 2 | 1486 |
>  > | 4 | 15 |
>  >
>  > Both columns are completely populated.
>  >
>  > Total row numbers are:
>  > merge table: 3,773,559,881
>  > current partition: 30,754,353
>  >
>  > I get the the following performance numbers:
>  > sql>select count(*) from mytable where timestamp between timestamp
>  > '2016-04-15 13:05:05' and timestamp '2016-04-15 13:20:05' and abort_flag
>  > in (0);
>  > +--------+
>  > | L1 |
>  > +========+
>  > | 735660 |
>  > +--------+
>  > 1 tuple (1.0s)
>  >
>  > sql>select count(*) from mytable where timestamp between timestamp
>  > '2016-04-15 13:05:05' and timestamp '2016-04-15 13:20:05' and abort_flag
>  > = 0;
>  > +--------+
>  > | L1 |
>  > +========+
>  > | 735660 |
>  > +--------+
>  > 1 tuple (34.0s)
>  >
>  > So there is a very significant, and IMHO surprising, performance
>  > difference between the "equals" and the "in" comparison.
> Well not that surprising. The choice on which operator in MonetDB isn't
> based on the content of the table but using relatively simpel rules. A
> equality is done before a range, and a range before an in.
> So in case of the in, the very selective range query is done first.
> In case of the equality, the equality is done first.
>
> Niels
>  >
>  > Doing the same query directly against the sub table has a much more
>  > consistent performance:
>  > sql>select count(*) from mytable_20160415 where timestamp between
>  > timestamp '2016-04-15 13:05:05' and timestamp '2016-04-15 13:20:05' and
>  > abort_flag IN (0);
>  > +--------+
>  > | L1 |
>  > +========+
>  > | 735660 |
>  > +--------+
>  > 1 tuple (618.914ms)
>  >
>  > sql>select count(*) from mytable_20160415 where timestamp between
>  > timestamp '2016-04-15 13:05:05' and timestamp '2016-04-15 13:20:05' and
>  > abort_flag = 0;
>  > +--------+
>  > | L1 |
>  > +========+
>  > | 735660 |
>  > +--------+
>  > 1 tuple (1.1s)
>  >
>  > The same can be observed with another column, where ">= 4" is much
>  > slower than the (in my case) equivalent "IN (4,5,6)".
>  >
>  > I guess, for my uses I can always switch to the "in" conditions, because
>  > all relevant columns should have very low cardinality, but in general,
>  > this seems odd.
>  >
>  > Is there an explanation for this behaviour, like optimizer problems with
>  > merge tables? Or maybe a way to avoid it?
>  >
>  > Regards,
>  > Marc
>  > _______________________________________________
>  > users-list mailing list
>  > users-list at monetdb.org
>  > https://www.monetdb.org/mailman/listinfo/users-list
>
> --
> Niels Nes, Manager ITF, Centrum Wiskunde & Informatica (CWI)
> Science Park 123, 1098 XG Amsterdam, The Netherlands
> room L3.14, phone ++31 20 592-4098 sip:4098 at sip.cwi.nl
> url: https://www.cwi.nl/people/niels e-mail: Niels.Nes at cwi.nl
> _______________________________________________
> users-list mailing list
> users-list at monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
>
>
>
>
>
> _______________________________________________
> users-list mailing list
> users-list at monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
>



More information about the users-list mailing list