[MonetDB-users] Very slow group by sql query

Arjen van der Meijden acmmailing at tweakers.net
Sun Oct 21 14:33:58 CEST 2007


Hi list,

I was just doing some basic queries on MonetDB5/SQL to see if it is 
suitable for my application, I'm doing lots of aggregates on some 
logfile-abstractions. Basically they all boil down to 'how many unique 
visitors and total pageviews where there in period X-Y in section Z'.

I have this table:
pageviews (
  timestamp timestamp not null,
  clientip varchar(15) not null,
  sectionid smallint not null,
  itemid integer not null,
  channelid smallint default 0
)

Currently it only contains data for last september, with about 2M 
records/day, and 5.6M in total.

There are no additional indexes in this case.

When doing a query like this, monetdb very fast. Once the data is in the 
memory cache, it returns (according to trace) in about half a second.

select count(*) from pageviews
where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00';

The result is 1916813

This one is also pretty fast, taking about 1.7 second

select count(distinct clientip) from pageviews
where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00';

The result is 165700

And the third which is pretty fast:
select channelid, count(*) from pageviews
where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00'
group by channelid;

Here's the distribution, and its returned in about 0.6 second
[ 0,    538187  ]
[ 1,    1108478 ]
[ 4,    42867   ]
[ 3,    145565  ]
[ 2,    81716   ]

But when I combine those last two queries, the result isn't returned in 
a reasonable amount of time, I waited for more than half an hour and it 
still hadn't returned the results.

select channelid, count(distinct clientip), count(*)
from pageviews
where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00'
group by channelid;

I'm not very good at reading your explain output yet, so I've attached 
the resulting explain for that query.

Is there a way to speed up this type of query? It seems a bit odd that 
it's taking more than half an hour (postgresql does it in about 20 
seconds) while the other queries return much faster (postgresql does 
them in about 14 seconds).

Best regards,

Arjen
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: explain-slow-group-by.txt
URL: <http://www.monetdb.org/pipermail/users-list/attachments/20071021/2b9215db/attachment.txt>


More information about the users-list mailing list