[MonetDB-users] Very slow group by sql query

Arjen van der Meijden acmmailing at tweakers.net
Sun Oct 21 16:02:15 CEST 2007


On 21-10-2007 14:51 Niels Nes wrote:
> On Sun, Oct 21, 2007 at 02:33:58PM +0200, Arjen van der Meijden wrote:
>> Hi list,
>>
>> I was just doing some basic queries on MonetDB5/SQL to see if it is 
>> suitable for my application, I'm doing lots of aggregates on some 
>> logfile-abstractions. Basically they all boil down to 'how many unique 
>> visitors and total pageviews where there in period X-Y in section Z'.
>>
> Which version of M5/SQL are you using?

I downloaded/installed the source tarball 'MonetDB-Mars-SuperBall-SR3' 
and installed it using the monetdb-install.sh-script with the 
--enable-sql and --enable-optimise switches.

The README in that file sais I have these versions (is there another way 
to get those version numbers?):
MonetDB-1.18.2
MonetDB-SQL-2.18.2
MonetDB-client-1.18.2
MonetDB5-server-5.0.0

Its run on top of a virtualized debian linux etch, on top of Xen 3 again 
on debian etch.

The base of the system is a Xeon 5310 quad-core, 1.6Ghz machine with 4GB 
of memory and 10x 500GB sata disk in raid5 on a sas raid controller. The 
VM with this monetdb-install and the postgresql-install I compare it to, 
is given two of those cpu-cores and 1.5GB of memory.

At the time of these queries I was the only user on the entire system 
(including the other vm's) and as the other results show the data itself 
can be read very fast since it all fits easily in memory.

While the query is running, one cpu-core is just occupied 100% with 
user-time, there is no system time and no i/o-wait. The disks where idle 
as well.

> Could you run the same query with trace instead of explain?
> That could help identifying the problem.

I did, it took 1780 seconds. I've attached the trace-output, and also
attached the trace-output for a fast (1.4 sec) query.
Slow:
select channelid, count(distinct clientip), count(*)
from pageviews
where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00'
group by channelid;

Removing the count(*) its also slow, but I didn't wait for it to finish. 
A rewrite to something like this was also slow, but again, I didn't wait 
for the result:
select channelid, count(*), sum(views)
from
(
select channelid, clientip, count(*) as views
from pageviews
where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00'
group by channelid, clientip
) as f group by channelid

(in postgresql that one is faster due to the fact that their group by 
implementation is faster than their distinct)

The fast query:
select channelid, count(clientip) from pageviews
where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00'
group by channelid;


Best regards,

Arjen
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: trace-fast-group-by.txt
URL: <http://www.monetdb.org/pipermail/users-list/attachments/20071021/90faf8f0/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: trace-slow-group-by.txt
URL: <http://www.monetdb.org/pipermail/users-list/attachments/20071021/90faf8f0/attachment-0001.txt>


More information about the users-list mailing list