[MonetDB-users] Very slow group by sql query

Niels Nes Niels.Nes at cwi.nl
Sun Oct 21 14:51:47 CEST 2007


On Sun, Oct 21, 2007 at 02:33:58PM +0200, Arjen van der Meijden wrote:
> Hi list,
> 
> I was just doing some basic queries on MonetDB5/SQL to see if it is 
> suitable for my application, I'm doing lots of aggregates on some 
> logfile-abstractions. Basically they all boil down to 'how many unique 
> visitors and total pageviews where there in period X-Y in section Z'.
> 
Which version of M5/SQL are you using?

> I have this table:
> pageviews (
>  timestamp timestamp not null,
>  clientip varchar(15) not null,
>  sectionid smallint not null,
>  itemid integer not null,
>  channelid smallint default 0
> )
> 
> Currently it only contains data for last september, with about 2M 
> records/day, and 5.6M in total.
> 
> There are no additional indexes in this case.
> 
> When doing a query like this, monetdb very fast. Once the data is in the 
> memory cache, it returns (according to trace) in about half a second.
> 
> select count(*) from pageviews
> where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00';
> 
> The result is 1916813
> 
> This one is also pretty fast, taking about 1.7 second
> 
> select count(distinct clientip) from pageviews
> where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00';
> 
> The result is 165700
> 
> And the third which is pretty fast:
> select channelid, count(*) from pageviews
> where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00'
> group by channelid;
> 
> Here's the distribution, and its returned in about 0.6 second
> [ 0,    538187  ]
> [ 1,    1108478 ]
> [ 4,    42867   ]
> [ 3,    145565  ]
> [ 2,    81716   ]
> 
> But when I combine those last two queries, the result isn't returned in 
> a reasonable amount of time, I waited for more than half an hour and it 
> still hadn't returned the results.
Thats indeed taking way to long.
> 
> select channelid, count(distinct clientip), count(*)
> from pageviews
> where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00'
> group by channelid;
> 
> I'm not very good at reading your explain output yet, so I've attached 
> the resulting explain for that query.
Could you run the same query with trace instead of explain?
That could help identifying the problem.

Niels
> 
> Is there a way to speed up this type of query? It seems a bit odd that 
> it's taking more than half an hour (postgresql does it in about 20 
> seconds) while the other queries return much faster (postgresql does 
> them in about 14 seconds).
> 
> Best regards,
> 
> Arjen

> function user.s5_1():void;
>     _1:bat[:oid,:timestamp]{rows=55914459:lng,bid=1968}  := sql.bind("sys","pageviews","timestamp",0);
>     _10:bat[:oid,:timestamp]{rows=0:lng,bid=2323}  := sql.bind("sys","pageviews","timestamp",1);
>     constraints.emptySet(_10);
>     _10:bat[:oid,:timestamp]{rows=0:lng,bid=2323}  := nil;
>     _12:bat[:oid,:timestamp]{rows=0:lng,bid=2324}  := sql.bind("sys","pageviews","timestamp",2);
>     constraints.emptySet(_12);
>     _12:bat[:oid,:timestamp]{rows=0:lng,bid=2324}  := nil;
>     _17:bat[:oid,:sht]{notnil=true,rows=55914459:lng,bid=1969}  := sql.bind("sys","pageviews","channelid",0);
>     _24:bat[:oid,:str]{rows=55914459:lng,bid=1961}  := sql.bind("sys","pageviews","clientip",0);
>     _8{rows=27957230:lng} := algebra.uselect(_1,2007-09-21 00:00:00.000,2007-09-22 00:00:00.000,true,true);
>     _1:bat[:oid,:timestamp]{rows=55914459:lng,bid=1968}  := nil;
>     _15{rows=27957230:lng} := algebra.markT(_8,0 at 0);
>     _8{rows=27957230:lng} := nil;
>     _16{rows=27957230:lng} := bat.reverse(_15);
>     _15{rows=27957230:lng} := nil;
>     _19{rows=27957230:lng} := algebra.join(_16,_17);
>     (ext40,grp38):= group.new(_19);
>     _19{rows=27957230:lng} := nil;
>     _22 := bat.mirror(ext40);
>     ext40 := nil;
>     _23 := algebra.joinPath(_22,_16,_17);
>     _17:bat[:oid,:sht]{notnil=true,rows=55914459:lng,bid=1969}  := nil;
>     _26{rows=27957230:lng} := algebra.join(_16,_24);
>     _16{rows=27957230:lng} := nil;
>     _24:bat[:oid,:str]{rows=55914459:lng,bid=1961}  := nil;
>     (_27,grp55):= group.new(grp38);
>     (_29,grp57):= group.derive(_27,grp55,_26);
>     _27 := nil;
>     grp55 := nil;
>     _31 := bat.mirror(_29);
>     _29 := nil;
>     _32 := algebra.semijoin(_26,_31);
>     _26{rows=27957230:lng} := nil;
>     _31 := nil;
>     _33:bat[:oid,:int]  := aggr.count_no_nil(_32,grp38,_22);
>     _32 := nil;
>     _34:bat[:oid,:int]  := aggr.count(grp38,grp38,_22);
>     grp38 := nil;
>     _22 := nil;
>     _35 := sql.resultSet(3,1,_23);
>     sql.rsColumn(_35,"sys.pageviews","channelid","smallint",16,0,_23);
>     _23 := nil;
>     sql.rsColumn(_35,"sys.pageviews","count_no_nil_clientip","int",15,0,_33);
>     _33:bat[:oid,:int]  := nil;
>     sql.rsColumn(_35,"sys.pageviews","count_channelid","int",32,0,_34);
>     _34:bat[:oid,:int]  := nil;
>     sql.exportResult(_35,"");
> end s5_1;

> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> _______________________________________________
> MonetDB-users mailing list
> MonetDB-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/monetdb-users


-- 

Niels Nes, Centre for Mathematics and Computer Science (CWI)
Kruislaan 413, 1098 SJ Amsterdam, The Netherlands
room C0.02,  phone ++31 20 592-4098, fax ++31 20 592-4312
url: http://www.cwi.nl/~niels   e-mail: Niels.Nes at cwi.nl




More information about the users-list mailing list