Okay - will do, though it'll probably be several hours!


On 29 November 2013 10:54, Stefan Manegold <Stefan.Manegold@cwi.nl> wrote:
Will,

did / can you run your first (slow) query twice in a row and report the speeds of both runs?

Thanks!
Stefan

----- Original Message -----
>
>
> On 29/11/13 11:36, Will Muldrew wrote:
> > Hi
> >
> > I've been trying out monetdb on a 350M row table (modern linux
> > workstation with 24GB RAM + SSD).  This table has a sorted TIMESTAMP
> > column and a second unsorted VARCHAR(6) column which is non-unique
> > (perhaps 50M distinct values fairly randomly distributed).
> >
> > This query is very slow, spends huge amount of time writing, and has
> > very low CPU (intermediate results?):
> >
> >    select count(*) from t_order where orderId = 'XXXXXX'
> >
> > This takes hours to complete.
> >
> > However, this query is a lot faster, even though the timestamp clause is
> > redundant (all my data is > '2012-01-01'):
> It decided it can use parallel processing effectively.
> Why it did not in the first query is unknown.
>
> >
> >    select count(*) from t_order where orderId = 'XXXXXX' and time >
> > '1970-01-01';
> >
> > The first run is slowish (10s of seconds - I guess because we're reading
> > the orderId column fully), however subsequent runs are sub-second.
> Yes, that is caching effect.
> >
> > Obviously I could write an app which puts the spurious timestamp clause
> > into the where, but it's still confusing and not ideal.  Is there
> > something I can do?
> >
> regards, Martin
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
>

--
| Stefan.Manegold@CWI.nl | DB Architectures   (DA) |
| www.CWI.nl/~manegold/  | Science Park 123 (L321) |
| +31 (0)20 592-4212     | 1098 XG Amsterdam  (NL) |

_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list