[MonetDB-users] Very slow group by sql query

Martin Kersten Martin.Kersten at cwi.nl
Sun Oct 21 14:56:05 CEST 2007


Arjen van der Meijden wrote:
Hi arjen,

thanks for this detailed report. For completeness, please include 
MonetDB/SQL version information
and the platform you are working on (Win/Linux, HW).

At first sight, I don't see any suspicious code. This hints at a 
situation that some property in
the kernel might not be set correctly, or a rewrite in the SQL front-end 
went astray.

For this we have to rerun the test with possibly debugging enabled.
If you stumble upon such unexpected cases, the 'trace select ...' is 
often helpful,
because it gives the execution time for each instruction and the size of
intermediates.

To assess if a property is not set correctly, --algorithms and 
--xproperties may provide
good clues (for the experts ;-))


regards, Martin
> Hi list,
>
> I was just doing some basic queries on MonetDB5/SQL to see if it is 
> suitable for my application, I'm doing lots of aggregates on some 
> logfile-abstractions. Basically they all boil down to 'how many unique 
> visitors and total pageviews where there in period X-Y in section Z'.
>
> I have this table:
> pageviews (
>  timestamp timestamp not null,
>  clientip varchar(15) not null,
>  sectionid smallint not null,
>  itemid integer not null,
>  channelid smallint default 0
> )
>
> Currently it only contains data for last september, with about 2M 
> records/day, and 5.6M in total.
>
> There are no additional indexes in this case.
>
> When doing a query like this, monetdb very fast. Once the data is in 
> the memory cache, it returns (according to trace) in about half a second.
>
> select count(*) from pageviews
> where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00';
>
> The result is 1916813
>
> This one is also pretty fast, taking about 1.7 second
>
> select count(distinct clientip) from pageviews
> where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00';
>
> The result is 165700
>
> And the third which is pretty fast:
> select channelid, count(*) from pageviews
> where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00'
> group by channelid;
>
> Here's the distribution, and its returned in about 0.6 second
> [ 0,    538187  ]
> [ 1,    1108478 ]
> [ 4,    42867   ]
> [ 3,    145565  ]
> [ 2,    81716   ]
>
> But when I combine those last two queries, the result isn't returned 
> in a reasonable amount of time, I waited for more than half an hour 
> and it still hadn't returned the results.
>
> select channelid, count(distinct clientip), count(*)
> from pageviews
> where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00'
> group by channelid;
>
> I'm not very good at reading your explain output yet, so I've attached 
> the resulting explain for that query.
>
> Is there a way to speed up this type of query? It seems a bit odd that 
> it's taking more than half an hour (postgresql does it in about 20 
> seconds) while the other queries return much faster (postgresql does 
> them in about 14 seconds).
>
> Best regards,
>
> Arjen
> ------------------------------------------------------------------------
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> ------------------------------------------------------------------------
>
> _______________________________________________
> MonetDB-users mailing list
> MonetDB-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/monetdb-users





More information about the users-list mailing list