[MonetDB-users] Profiling queries
Niels.Nes at cwi.nl
Wed Dec 5 08:40:31 CET 2007
On Wed, Dec 05, 2007 at 12:25:36AM -0700, m h wrote:
> On Dec 5, 2007 12:07 AM, Niels Nes <Niels.Nes at cwi.nl> wrote:
> > On Tue, Dec 04, 2007 at 12:56:27PM -0700, m h wrote:
> > >
> > > A few more caveats. (Yes, this is a poor benchmark but after reading
> > > pdfs saying that monetdb was 7 times faster than a 64 way oracle box,
> > > I thought I'd get quite a bit of speedup.) The old db is running
> > > under vmware on a Core2 1.86Ghz box. Monetdb is running on my laptop,
> > > a Pentium M 1.86 Ghz machine.
> > cpu wise this would allready give you almost an par results as a core2 is
> > quite a bit faster than a pentium M. An important thing would be to
> > know the memory size of your laptop and the memory requirements of
> > your query. What kind of columns (type of the columns) are involved.
> Like I said, it's not apples to apples. Tomorrow I'll deploy monetdb
> on the same machine running the vm'd db. (But not under vmware, so
> again not the exactly the same, but the hardware now should be faster
> than the old db hardware since no virtualization).
> My laptop has 1.5Gig and Monetdb was using between 200M to 1+ Gigs.
> (It eats memory during loading the data. I've got 67 Million rows to
> load and after 4 Million I was swapping like crazy, but I'll save that
> for another email/ticket). After a restart memory goes down to 200M
> again and I can query without stopping.
As I don't known your schema/data types I don't know if 1+ GB is much.
Monetdb likes memory ;-).
> Machine running the old db (under VM) only has 400M allocated to it
> (and also has 67 Million rows)!
But your old db is block oriented. And your data is somehow organised
that that doesn't hurt much too (any sorting or cluster indices?).
> You know what I'm willing to give you anonymous schemas and queries.
That would help a lot.
> I don't know what to do with 67 Million rows in the fact table
Well uploading should be a problem, but de-personalizing tyour
67 M fact table may be problem.
> > But lets try to get to the problem here.
> > First thing to make sure if there is no overhead in the python code, ie
> > is it MonetDB/SQL server side or client side.
> Good point. Yes, I've been thinking that said python code for the
> other db might be a bit faster than monetdb's python code. One thing
> to check out is profiling the python side.
> > Could you therefor
> > 1) give the number of rows return (doesn't have to be exact)
> Current query is returning 90 rows
> > 2) try with mclient -t
> I tried that before blowing away the data and got like 3 seconds
3 seconds is very high, -t gives times in millisecends.
> (guess) but I also have 90 rows of output scrolling... Is there a way
Simply redirect output to a file always helps.
> to disable output, so you only measure the impact of the query and not
> displaying the results?
> I'll have to try it again tomorrow, I re-installed the latest build to
> test the order by and blew away my data.
> > >
> > > Since the query is confidential I can't really put it here, but I can
> > > describe it. I'm pulling stuff out of my fact table and two other
> > > dimension tables. The select clause sums a few columns, and on one
> > > sums the product of one column and the sum of a few others. The
> > > coalesce function is used to convert NULLs to 0. The where statement
> > > joins the tables using the = operator and then has two "IN" clauses.
> > You may have hit a problem with our "IN" clause handling. Are the
> > sizes sets your doing the IN against large?
> > > Then there is a group by (since I'm aggregating in the select) and an
> > > order by.
> > Are there any (foreign/primary/unique) key's involved?
> Primary and unique yes, no foreign.
No foreign make the joins expensive afcourse, are these single column
joins are are these joins over multiple columns?
> > We would like your to see your query/data. If its all too confidential
> > maybe you could anonimize it.
> Am willing to work with you guys. I'd like to come up with something
> but the 67 Rows of data is around 3Gigs compressed.... Perhaps I can
> find a way to get you some data...
We have a ftp service at cwi. Any way we could/should discuss this of the
> > >
> > > Any hints on how to increase performance? Do I need to tune my query
> > > or tweak other things? (Yes I know the laptop machine is not apples
> > > to apples to the vmware machine)
> > What you could do is run your query with 'trace' infront and see which
> > lower level monetdb statement takes a lot of time.
> > Maybe we can without disclosing your data extra fix the performance problem.
> Yeah, I ran with TRACE and tons of data scrolled by. I need to learn
> what it actually means (ran over my xterm buffer....). The
> intermediate language looks like perl to me... (sorry perl people ;))
So now i'm offended. Perl is not really a language I appriciate.
Same here redirect to a file and view with your favourite editor.
> SF.Net email is sponsored by: The Future of Linux Business White Paper
> from Novell. From the desktop to the data center, Linux is going
> mainstream. Let it simplify your IT future.
> MonetDB-users mailing list
> MonetDB-users at lists.sourceforge.net
Niels Nes, Centre for Mathematics and Computer Science (CWI)
Kruislaan 413, 1098 SJ Amsterdam, The Netherlands
room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312
url: http://www.cwi.nl/~niels e-mail: Niels.Nes at cwi.nl
More information about the users-list