[MonetDB-users] Profiling queries

Niels Nes Niels.Nes at cwi.nl
Wed Dec 5 08:07:46 CET 2007

On Tue, Dec 04, 2007 at 12:56:27PM -0700, m h wrote:
> Hello all-
> First of all thanks to you guys for your great help these past few
> weeks.  I really appreciate it.  I've learned a bit, and hopefully
> your product is improving from my bug reports.
> I've managed to pull a bit of my data from my proprietary db into
> monetdb. (Have some 60 million rows in the fact table, pulled in 5
> million)!  I even ran a query against it and got the same numbers
> (though in a different order)!
> So I did a very basic poor mans benchmark.  My python code can hit the
> old db or monetdb by changing one line.  So I took timestamps around
> the query execution portion.
> The old db takes 0.21 seconds when cold.  Subsequent queries take
> 0.027-0.072 seconds
> Monetdb takes 0.67 seconds when cold and 0.12-0.23 otherwise.
> It appears that the old db is faster when cold than monetdb is after
> being primed.
> A few more caveats.  (Yes, this is a poor benchmark but after reading
> pdfs saying that monetdb was 7 times faster than a 64 way oracle box,
> I thought I'd get quite a bit of speedup.)  The old db is running
> under vmware on a Core2 1.86Ghz box.  Monetdb is running on my laptop,
> a Pentium M 1.86 Ghz machine.
cpu wise this would allready give you almost an par results as a core2 is
quite a bit faster than a pentium M. An important thing would be to
know the memory size of your laptop and the memory requirements of
your query. What kind of columns (type of the columns) are involved.

But lets try to get to the problem here. 

First thing to make sure if there is no overhead in the python code, ie
is it MonetDB/SQL server side or client side.

Could you therefor 
1) give the number of rows return (doesn't have to be exact)
2) try with mclient -t

> Since the query is confidential I can't really put it here, but I can
> describe it.  I'm pulling stuff out of my fact table and two other
> dimension tables.  The select clause sums a few columns, and on one
> sums the product of one column and the sum of a few others.  The
> coalesce function is used to convert NULLs to 0.  The where statement
> joins the tables using the = operator and then has two "IN" clauses.
You may have hit a problem with our "IN" clause handling. Are the
sizes sets your doing the IN against large?
> Then there is a group by (since I'm aggregating in the select) and an
> order by.
Are there any (foreign/primary/unique) key's involved?

We would like your to see your query/data. If its all too confidential
maybe you could anonimize it. 
> Any hints on how to increase performance?  Do I need to tune my query
> or tweak other things?  (Yes I know the laptop machine is not apples
> to apples to the vmware machine)
What you could do is run your query with 'trace' infront and see which
lower level monetdb statement takes a lot of time.
Maybe we can without disclosing your data extra fix the performance problem.

> Am open to suggestions.
> thanks much,
> -matt
> -------------------------------------------------------------------------
> SF.Net email is sponsored by: The Future of Linux Business White Paper
> from Novell.  From the desktop to the data center, Linux is going
> mainstream.  Let it simplify your IT future.
> http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
> _______________________________________________
> MonetDB-users mailing list
> MonetDB-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/monetdb-users


Niels Nes, Centre for Mathematics and Computer Science (CWI)
Kruislaan 413, 1098 SJ Amsterdam, The Netherlands
room C0.02,  phone ++31 20 592-4098, fax ++31 20 592-4312
url: http://www.cwi.nl/~niels   e-mail: Niels.Nes at cwi.nl

More information about the users-list mailing list