[MonetDB-users] Very slow group by sql query

Stefan Manegold Stefan.Manegold at cwi.nl
Sun Oct 21 14:53:27 CEST 2007


Hi Arjen,

while Niels should be the one who can give an educated comment, here's just
a simple preliminary idea to potentially help(?) locating the origin of the
unexpectedly slow performance:

Your "combined" queries differs in two way from the original two queries
that run fast:
(1) there are two aggregations in one query:
	count(distinct clientip) & count(*)
(2) count(distinct clientip) is now combined with a group-by instead of
    being global.

Doing only one modification at a time gives much more info, which of the two
changes might trigger the performance degradation, repectively whether it in
only the combination of the two.

Hence, could you please also run the following two queries and report their
performance?

select channelid, count(distinct clientip), count(*)
from pageviews
where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00';

select channelid, count(distinct clientip)
from pageviews
where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00'
group by channelid;

Thanks in advance!

Stefan


ps: and the "usual" questions are: which version of MonetDB5/SQL are you
    using and on what kind of platform (HW, OS, etc.) are you running it?


On Sun, Oct 21, 2007 at 02:33:58PM +0200, Arjen van der Meijden wrote:
> Hi list,
> 
> I was just doing some basic queries on MonetDB5/SQL to see if it is 
> suitable for my application, I'm doing lots of aggregates on some 
> logfile-abstractions. Basically they all boil down to 'how many unique 
> visitors and total pageviews where there in period X-Y in section Z'.
> 
> I have this table:
> pageviews (
>  timestamp timestamp not null,
>  clientip varchar(15) not null,
>  sectionid smallint not null,
>  itemid integer not null,
>  channelid smallint default 0
> )
> 
> Currently it only contains data for last september, with about 2M 
> records/day, and 5.6M in total.
> 
> There are no additional indexes in this case.
> 
> When doing a query like this, monetdb very fast. Once the data is in the 
> memory cache, it returns (according to trace) in about half a second.
> 
> select count(*) from pageviews
> where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00';
> 
> The result is 1916813
> 
> This one is also pretty fast, taking about 1.7 second
> 
> select count(distinct clientip) from pageviews
> where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00';
> 
> The result is 165700
> 
> And the third which is pretty fast:
> select channelid, count(*) from pageviews
> where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00'
> group by channelid;
> 
> Here's the distribution, and its returned in about 0.6 second
> [ 0,    538187  ]
> [ 1,    1108478 ]
> [ 4,    42867   ]
> [ 3,    145565  ]
> [ 2,    81716   ]
> 
> But when I combine those last two queries, the result isn't returned in 
> a reasonable amount of time, I waited for more than half an hour and it 
> still hadn't returned the results.
> 
> select channelid, count(distinct clientip), count(*)
> from pageviews
> where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00'
> group by channelid;
> 
> I'm not very good at reading your explain output yet, so I've attached 
> the resulting explain for that query.
> 
> Is there a way to speed up this type of query? It seems a bit odd that 
> it's taking more than half an hour (postgresql does it in about 20 
> seconds) while the other queries return much faster (postgresql does 
> them in about 14 seconds).
> 
> Best regards,
> 
> Arjen

> function user.s5_1():void;
>     _1:bat[:oid,:timestamp]{rows=55914459:lng,bid=1968}  := sql.bind("sys","pageviews","timestamp",0);
>     _10:bat[:oid,:timestamp]{rows=0:lng,bid=2323}  := sql.bind("sys","pageviews","timestamp",1);
>     constraints.emptySet(_10);
>     _10:bat[:oid,:timestamp]{rows=0:lng,bid=2323}  := nil;
>     _12:bat[:oid,:timestamp]{rows=0:lng,bid=2324}  := sql.bind("sys","pageviews","timestamp",2);
>     constraints.emptySet(_12);
>     _12:bat[:oid,:timestamp]{rows=0:lng,bid=2324}  := nil;
>     _17:bat[:oid,:sht]{notnil=true,rows=55914459:lng,bid=1969}  := sql.bind("sys","pageviews","channelid",0);
>     _24:bat[:oid,:str]{rows=55914459:lng,bid=1961}  := sql.bind("sys","pageviews","clientip",0);
>     _8{rows=27957230:lng} := algebra.uselect(_1,2007-09-21 00:00:00.000,2007-09-22 00:00:00.000,true,true);
>     _1:bat[:oid,:timestamp]{rows=55914459:lng,bid=1968}  := nil;
>     _15{rows=27957230:lng} := algebra.markT(_8,0 at 0);
>     _8{rows=27957230:lng} := nil;
>     _16{rows=27957230:lng} := bat.reverse(_15);
>     _15{rows=27957230:lng} := nil;
>     _19{rows=27957230:lng} := algebra.join(_16,_17);
>     (ext40,grp38):= group.new(_19);
>     _19{rows=27957230:lng} := nil;
>     _22 := bat.mirror(ext40);
>     ext40 := nil;
>     _23 := algebra.joinPath(_22,_16,_17);
>     _17:bat[:oid,:sht]{notnil=true,rows=55914459:lng,bid=1969}  := nil;
>     _26{rows=27957230:lng} := algebra.join(_16,_24);
>     _16{rows=27957230:lng} := nil;
>     _24:bat[:oid,:str]{rows=55914459:lng,bid=1961}  := nil;
>     (_27,grp55):= group.new(grp38);
>     (_29,grp57):= group.derive(_27,grp55,_26);
>     _27 := nil;
>     grp55 := nil;
>     _31 := bat.mirror(_29);
>     _29 := nil;
>     _32 := algebra.semijoin(_26,_31);
>     _26{rows=27957230:lng} := nil;
>     _31 := nil;
>     _33:bat[:oid,:int]  := aggr.count_no_nil(_32,grp38,_22);
>     _32 := nil;
>     _34:bat[:oid,:int]  := aggr.count(grp38,grp38,_22);
>     grp38 := nil;
>     _22 := nil;
>     _35 := sql.resultSet(3,1,_23);
>     sql.rsColumn(_35,"sys.pageviews","channelid","smallint",16,0,_23);
>     _23 := nil;
>     sql.rsColumn(_35,"sys.pageviews","count_no_nil_clientip","int",15,0,_33);
>     _33:bat[:oid,:int]  := nil;
>     sql.rsColumn(_35,"sys.pageviews","count_channelid","int",32,0,_34);
>     _34:bat[:oid,:int]  := nil;
>     sql.exportResult(_35,"");
> end s5_1;

> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> _______________________________________________
> MonetDB-users mailing list
> MonetDB-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/monetdb-users


-- 
| Dr. Stefan Manegold | mailto:Stefan.Manegold at cwi.nl |
| CWI,  P.O.Box 94079 | http://www.cwi.nl/~manegold/  |
| 1090 GB Amsterdam   | Tel.: +31 (20) 592-4212       |
| The Netherlands     | Fax : +31 (20) 592-4312       |




More information about the users-list mailing list