[MonetDB-users] Very slow group by sql query

Niels Nes Niels.Nes at cwi.nl
Sun Oct 21 16:41:17 CEST 2007


On Sun, Oct 21, 2007 at 04:02:15PM +0200, Arjen van der Meijden wrote:
> On 21-10-2007 14:51 Niels Nes wrote:
> >On Sun, Oct 21, 2007 at 02:33:58PM +0200, Arjen van der Meijden wrote:
> >>Hi list,
> >>
> >>I was just doing some basic queries on MonetDB5/SQL to see if it is 
> >>suitable for my application, I'm doing lots of aggregates on some 
> >>logfile-abstractions. Basically they all boil down to 'how many unique 
> >>visitors and total pageviews where there in period X-Y in section Z'.
> >>
> >Which version of M5/SQL are you using?
> 
> I downloaded/installed the source tarball 'MonetDB-Mars-SuperBall-SR3' 
> and installed it using the monetdb-install.sh-script with the 
> --enable-sql and --enable-optimise switches.
Could your try with the to be released code, ie run

monetdb-install.sh --enable-sql --enable-optimize --nightly=stable

I get slightly different code, which looks faster.

Niels

> 
> The README in that file sais I have these versions (is there another way 
> to get those version numbers?):
> MonetDB-1.18.2
> MonetDB-SQL-2.18.2
> MonetDB-client-1.18.2
> MonetDB5-server-5.0.0
> 
> Its run on top of a virtualized debian linux etch, on top of Xen 3 again 
> on debian etch.
> 
> The base of the system is a Xeon 5310 quad-core, 1.6Ghz machine with 4GB 
> of memory and 10x 500GB sata disk in raid5 on a sas raid controller. The 
> VM with this monetdb-install and the postgresql-install I compare it to, 
> is given two of those cpu-cores and 1.5GB of memory.
> 
> At the time of these queries I was the only user on the entire system 
> (including the other vm's) and as the other results show the data itself 
> can be read very fast since it all fits easily in memory.
> 
> While the query is running, one cpu-core is just occupied 100% with 
> user-time, there is no system time and no i/o-wait. The disks where idle 
> as well.
> 
> >Could you run the same query with trace instead of explain?
> >That could help identifying the problem.
> 
> I did, it took 1780 seconds. I've attached the trace-output, and also
> attached the trace-output for a fast (1.4 sec) query.
> Slow:
> select channelid, count(distinct clientip), count(*)
> from pageviews
> where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00'
> group by channelid;
> 
> Removing the count(*) its also slow, but I didn't wait for it to finish. 
> A rewrite to something like this was also slow, but again, I didn't wait 
> for the result:
> select channelid, count(*), sum(views)
> from
> (
> select channelid, clientip, count(*) as views
> from pageviews
> where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00'
> group by channelid, clientip
> ) as f group by channelid
> 
> (in postgresql that one is faster due to the fact that their group by 
> implementation is faster than their distinct)
> 
> The fast query:
> select channelid, count(clientip) from pageviews
> where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00'
> group by channelid;
> 
> 
> Best regards,
> 
> Arjen

> sql>trace select channelid, count(clientip) from pageviews where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00' group by channelid;
> #     2 usec#    mdb.setTimer(_2=true)
> #     1 usec#   0  0#    mdb.setFlow(_2=true)
> #    10 usec# 426M  0#    _3:bat[:oid,:timestamp]  := sql.bind(_4="sys", _5="pageviews", _6="timestamp", _7=0)
> #     6 usec#   0  0#    _12:bat[:oid,:timestamp]  := sql.bind(_4="sys", _5="pageviews", _6="timestamp", _13=1)
> #     6 usec#   0  0#    constraints.emptySet(_12=<tmp_4426>bat[:oid,:timestamp]{0})
> #     6 usec#   0  0#    _12:bat[:oid,:timestamp]  := nil;
> #     5 usec#   0  0#    _14:bat[:oid,:timestamp]  := sql.bind(_4="sys", _5="pageviews", _6="timestamp", _15=2)
> #     5 usec#   0  0#    constraints.emptySet(_14=<tmp_4427>bat[:oid,:timestamp]{0})
> #     4 usec#   0  0#    _14:bat[:oid,:timestamp]  := nil;
> #     5 usec# 106M  0#    _19:bat[:oid,:sht]  := sql.bind(_4="sys", _5="pageviews", _20="channelid", _7=0)
> #     5 usec# 426M  0#    _26:bat[:oid,:str]  := sql.bind(_4="sys", _5="pageviews", _27="clientip", _7=0)
> #    12 usec#   0  0#    _8 := calc.timestamp(A0="2007-09-21 00:00:00")
> #     5 usec#   0  0#    _9 := calc.timestamp(A1="2007-09-22 00:00:00")
> #477472 usec#  14M426M#    _10 := algebra.uselect(_3=<tmp_3660>bat[:oid,:timestamp]{55914459}, _8=2007-09-21 00:00:00.000, _9=2007-09-22 00:00:00.000, _11=true, _11=true)
> #    14 usec#   0  0#    _3:bat[:oid,:timestamp]  := nil;
> #    18 usec#  14M 14M#    _17 := algebra.markT(_10=<tmp_4541>bat[:oid,:oid]{1916813}, _16=0 at 0)
> #     5 usec#   0  0#    _10 := nil;
> #     7 usec#  14M 14M#    _18 := bat.reverse(_17=<tmp_4535>bat[:oid,:oid]{1916813})
> #     4 usec#   0  0#    _17 := nil;
> # 17425 usec#   3M121M#    _21 := algebra.join(_18=<~tmp_4535>bat[:oid,:oid]{1916813}, _19=<tmp_3661>bat[:oid,:sht]{55914459})
> # 73145 usec#  14M  3M#    (ext40,grp38):= group.new(_21=<tmp_4540>bat[:oid,:sht]{1916813})
> #  2023 usec#   0  0#    _21 := nil;
> #    31 usec#  80200#    _24 := bat.mirror(ext40=<tmp_4543>bat[:oid,:int]{5})
> #     6 usec#   0  0#    ext40 := nil;
> #    31 usec#  80121M#    _25 := algebra.joinPath(_24=<tmp_4537>bat[:oid,:oid]{5}, _18=<~tmp_4535>bat[:oid,:oid]{1916813}, _19=<tmp_3661>bat[:oid,:sht]{55914459})
> #     4 usec#   0  0#    _19:bat[:oid,:sht]  := nil;
> #754188 usec#  14M441M#    _28 := algebra.join(_18=<~tmp_4535>bat[:oid,:oid]{1916813}, _26=<tmp_3651>bat[:oid,:str]{55914459})
> #  6937 usec#   0  0#    _18 := nil;
> #     9 usec#   0  0#    _26:bat[:oid,:str]  := nil;
> # 38470 usec#  80 29M#    _29:bat[:oid,:int]  := aggr.count_no_nil(_28=<tmp_4543>bat[:oid,:str]{1916813}, grp38=<tmp_4532>bat[:oid,:oid]{1916813}, _24=<tmp_4537>bat[:oid,:oid]{5})
> # 21815 usec#   0  0#    _28 := nil;
> #  7452 usec#   0  0#    grp38 := nil;
> #    11 usec#   0  0#    _24 := nil;
> #    11 usec#   0 80#    _30 := sql.resultSet(_15=2, _13=1, _25=<tmp_4534>bat[:oid,:sht]{5})
> #    11 usec#   0 80#    sql.rsColumn(_30=1, _32="sys.pageviews", _20="channelid", _33="smallint", _34=16, _7=0, _25=<tmp_4534>bat[:oid,:sht]{5})
> #     4 usec#   0  0#    _25 := nil;
> #     6 usec#   0 80#    sql.rsColumn(_30=1, _32="sys.pageviews", _36="count_no_nil_clientip", _37="int", _38=15, _7=0, _29=<tmp_4542>bat[:oid,:int]{5})
> #     5 usec#   0  0#    _29:bat[:oid,:int]  := nil;
> % sys.pageviews,        sys.pageviews # table_name
> % channelid,    count_no_nil_clientip # name
> % smallint,     int # type
> % 1,    7 # length
> [ 0,    538187  ]
> [ 1,    1108478 ]
> [ 4,    42867   ]
> [ 3,    145565  ]
> [ 2,    81716   ]
> #    33 usec#   0  0#    sql.exportResult(_30=1, _40="")
> #1399855 usec#   0  0#    user.s1_1(_5="2007-09-21 00:00:00", _6="2007-09-22 00:00:00")
> #   0  0
> Timer    1401.374 msec

> sql>trace select channelid, count(distinct clientip), count(*)
> from pageviews
> where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00'
> group by channelid;
> 
> #     9 usec#    mdb.setTimer(_2=true)
> #     1 usec#   0  0#    mdb.setFlow(_2=true)
> #    19 usec# 426M  0#    _3:bat[:oid,:timestamp]  := sql.bind(_4="sys", _5="pageviews", _6="timestamp", _7=0)
> #     6 usec#   0  0#    _12:bat[:oid,:timestamp]  := sql.bind(_4="sys", _5="pageviews", _6="timestamp", _13=1)
> #     5 usec#   0  0#    constraints.emptySet(_12=<tmp_4426>bat[:oid,:timestamp]{0})
> #     6 usec#   0  0#    _12:bat[:oid,:timestamp]  := nil;
> #     5 usec#   0  0#    _14:bat[:oid,:timestamp]  := sql.bind(_4="sys", _5="pageviews", _6="timestamp", _15=2)
> #     4 usec#   0  0#    constraints.emptySet(_14=<tmp_4427>bat[:oid,:timestamp]{0})
> #     4 usec#   0  0#    _14:bat[:oid,:timestamp]  := nil;
> #     6 usec# 106M  0#    _19:bat[:oid,:sht]  := sql.bind(_4="sys", _5="pageviews", _20="channelid", _7=0)
> #     5 usec# 426M  0#    _26:bat[:oid,:str]  := sql.bind(_4="sys", _5="pageviews", _27="clientip", _7=0)
> #    27 usec#   0  0#    _8 := calc.timestamp(A0="2007-09-21 00:00:00")
> #     6 usec#   0  0#    _9 := calc.timestamp(A1="2007-09-22 00:00:00")
> #5670933 usec#  14M426M#    _10 := algebra.uselect(_3=<tmp_3660>bat[:oid,:timestamp]{55914459}, _8=2007-09-21 00:00:00.000, _9=2007-09-22 00:00:00.000, _11=true, _11=true)
> #    12 usec#   0  0#    _3:bat[:oid,:timestamp]  := nil;
> #    33 usec#  14M 14M#    _17 := algebra.markT(_10=<tmp_4532>bat[:oid,:oid]{1916813}, _16=0 at 0)
> #     5 usec#   0  0#    _10 := nil;
> #    14 usec#  14M 14M#    _18 := bat.reverse(_17=<tmp_4533>bat[:oid,:oid]{1916813})
> #     5 usec#   0  0#    _17 := nil;
> # 20206 usec#   3M121M#    _21 := algebra.join(_18=<~tmp_4533>bat[:oid,:oid]{1916813}, _19=<tmp_3661>bat[:oid,:sht]{55914459})
> # 73032 usec#  14M  3M#    (ext40,grp38):= group.new(_21=<tmp_4534>bat[:oid,:sht]{1916813})
> #  1954 usec#   0  0#    _21 := nil;
> #    47 usec#  80200#    _24 := bat.mirror(ext40=<tmp_4536>bat[:oid,:int]{5})
> #     6 usec#   0  0#    ext40 := nil;
> #    53 usec#  80121M#    _25 := algebra.joinPath(_24=<tmp_4537>bat[:oid,:oid]{5}, _18=<~tmp_4533>bat[:oid,:oid]{1916813}, _19=<tmp_3661>bat[:oid,:sht]{55914459})
> #     5 usec#   0  0#    _19:bat[:oid,:sht]  := nil;
> #1116473 usec#  14M441M#    _28 := algebra.join(_18=<~tmp_4533>bat[:oid,:oid]{1916813}, _26=<tmp_3651>bat[:oid,:str]{55914459})
> #  6845 usec#   0  0#    _18 := nil;
> #     7 usec#   0  0#    _26:bat[:oid,:str]  := nil;
> # 96203 usec#  14M 14M#    (_29,grp55):= group.new(grp38=<tmp_4535>bat[:oid,:oid]{1916813})
> #1774153086 usec#  25M 29M#    (_31,grp57):= group.derive(_29=<tmp_4533>bat[:oid,:int]{5}, grp55=<tmp_4532>bat[:oid,:oid]{1916813}, _28=<tmp_4540>bat[:oid,:str]{1916813})
> #    21 usec#   0  0#    _29 := nil;
> #  7605 usec#   0  0#    grp55 := nil;
> # 15287 usec#   4M 10M#    _33 := bat.mirror(_31=<tmp_4542>bat[:oid,:int]{283056})
> #  5978 usec#   0  0#    _31 := nil;
> #121028 usec#   4M 18M#    _34 := algebra.semijoin(_28=<tmp_4540>bat[:oid,:str]{1916813}, _33=<tmp_4533>bat[:oid,:oid]{283056})
> # 21771 usec#   0  0#    _28 := nil;
> #  2345 usec#   0  0#    _33 := nil;
> # 11474 usec#  80 18M#    _35:bat[:oid,:int]  := aggr.count_no_nil(_34=<tmp_4542>bat[:oid,:str]{283056}, grp38=<tmp_4535>bat[:oid,:oid]{1916813}, _24=<tmp_4537>bat[:oid,:oid]{5})
> #  5053 usec#   0  0#    _34 := nil;
> # 16686 usec#  80 29M#    _36:bat[:oid,:int]  := aggr.count(grp38=<tmp_4535>bat[:oid,:oid]{1916813}, grp38=<tmp_4535>bat[:oid,:oid]{1916813}, _24=<tmp_4537>bat[:oid,:oid]{5})
> #  7407 usec#   0  0#    grp38 := nil;
> #     9 usec#   0  0#    _24 := nil;
> #    10 usec#   0 80#    _37 := sql.resultSet(_38=3, _13=1, _25=<tmp_4534>bat[:oid,:sht]{5})
> #    12 usec#   0 80#    sql.rsColumn(_37=0, _40="sys.pageviews", _20="channelid", _41="smallint", _42=16, _7=0, _25=<tmp_4534>bat[:oid,:sht]{5})
> #     5 usec#   0  0#    _25 := nil;
> #     6 usec#   0 80#    sql.rsColumn(_37=0, _40="sys.pageviews", _44="count_no_nil_clientip", _45="int", _46=15, _7=0, _35=<tmp_4533>bat[:oid,:int]{5})
> #     5 usec#   0  0#    _35:bat[:oid,:int]  := nil;
> #     5 usec#   0 80#    sql.rsColumn(_37=0, _40="sys.pageviews", _48="count_channelid", _45="int", _49=32, _7=0, _36=<tmp_4542>bat[:oid,:int]{5})
> #     4 usec#   0  0#    _36:bat[:oid,:int]  := nil;
> % sys.pageviews,        sys.pageviews,  sys.pageviews # table_name
> % channelid,    count_no_nil_clientip,  count_channelid # name
> % smallint,     int,    int # type
> % 1,    6,      7 # length
> [ 0,    77524,  538187  ]
> [ 1,    110681, 1108478 ]
> [ 4,    20120,  42867   ]
> [ 3,    44457,  145565  ]
> [ 2,    30274,  81716   ]
> #    65 usec#   0  0#    sql.exportResult(_37=0, _51="")
> #1781362757 usec#   0  0#    user.s0_1(_5="2007-09-21 00:00:00", _6="2007-09-22 00:00:00")
> #   0  0
> Timer  1784035.472 msec

> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> _______________________________________________
> MonetDB-users mailing list
> MonetDB-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/monetdb-users


-- 

Niels Nes, Centre for Mathematics and Computer Science (CWI)
Kruislaan 413, 1098 SJ Amsterdam, The Netherlands
room C0.02,  phone ++31 20 592-4098, fax ++31 20 592-4312
url: http://www.cwi.nl/~niels   e-mail: Niels.Nes at cwi.nl




More information about the users-list mailing list