Yes looks like COUNT(DISTINCT sales_document) is a very expensive operation, the moment I remove DISTINCT and just fired the query like COUNT(sales_doc),sum(shpd_qty),sum(ord_qty)  .i.e with more AGGREGATE functions , its behaving very fast. But the data here might be wrong as we are not counting the DISTINCT sales_document ,so few sales_document might get counted multiple times.

Any other way of achieving the same but without using DISTINCT?

Also I did more testing with MonetDB and this time with Detail line level data (NO Aggregation) and on that front its slow.

I fired the query -  select * from booking where sales_document_line_comb = 12345610 and   , this took 40 Secs even though sales_document_line_comb is Primary KEY for booking table.

Same way I fired many detail queries but all of them are slow.

Any pointers here in terms of increasing the Detail line level perforrmace.


Thanks





On Mon, Jan 19, 2015 at 7:41 PM, Martin Kersten <martin@monetdb.org> wrote:
On 19/01/15 16:27, shamsul hassan wrote:
Hi All,
Hi

Any help /Inputs here are really appreciated.

Thanks

On Mon, Jan 19, 2015 at 11:17 AM, shamsul hassan <shamsulbuddy@gmail.com <mailto:shamsulbuddy@gmail.com>> wrote:

    Hi,

    I haven't understand it completely, does it mean that it will always use a single thread while doing Group by.
Every MAL instruction is handled by a single thread.
Parallel processing of aggregates is shown in the plans.

    Also I have noticed one behaviour as below --

    1) Very Fast Execution ---

    Select product_type,sum(ord_qty)  FROM bookings where ord_entry_date between '2014-04-01' and  '2014-10-01'  group by product_type;

    2) Suddenly very slow execution , the moment I add another AGGREGATE function something like below --

    select product_type,sum(ord_qty) ,COUNT(DISTINCT sales_document) FROM bookings where ord_entry_date between '2014-04-01' and  '2014-10-01'  group by product_type;
COUNT(DISTINCT sales_document) is an expensive operation, as all duplicate elimination routines require some form of sorting.


    So with 2 AGGREGATE functions it behaves very slow as compared to 1 AGGREGATE function.

    Also even with 2 AGGREGATE functions example above , if i add more dimentions and group by 2-3 columns then there is a slight improvement in performance even though we are adding more GROUP BYs

    Is this behaviour is expected, does anybody else have also experienced such behaviour.
Behaviour can often be explained by understanding the underlying algorithmic complexity of the
relational algebra operators, and the properties that actual columns expose. This calls for
a study of those algorithms (using the general literature on algorithms).

Trial and error won't bring you much further.

regards, Martin
    Can it be because of OCT2014-SP1 release ??

    Thanks


    On Sun, Jan 18, 2015 at 11:20 AM, Martin Kersten <martin@monetdb.org <mailto:martin@monetdb.org>> wrote:

        Hi

        Each MAL instruction is executed by a single thread.
        Some operators require the complete input to be available,
        such as in your case the group.subgroup().

        martin

        On 18/01/15 01:27, shamsul hassan wrote:

            Hi All,

            I have recently install monetdb "OCT 2014-SP1" release on my Oracle Enterprise 6 Linux server.

            Server configuration -

            8 CPU Cores
            64 GB RAM

            I have loaded a data set with 70 columns having 15 Millions Rows.

            I have started mserver5 as below --

            mserver5 --debug=10 --set *gdk_nr_threads=8* --dbpath=/data/monetdb/testdb  --set mapi_open=true --set mapi_port=54321  --set merovingian_uri=mapi:monetdb:/__/localhost:54321/testdb --set mapi_usock=/data/monetdb/__testdb/.mapi.sock --set
            monet_vault_key=/data/monetdb/__testdb/.vaultkey  --set sql_optimizer=default_pipe  --set max_clients=64

            Now I have ran a query as below --

            select product_type,COUNT(DISTINCT sales_document),sum(ord_qty)  FROM bookings where ord_entry_date between '2014-04-01' and  '2014-10-01'  group by product_type;

            This query took around 90 Secs ..
            So I started tomograph and checked plotted graph ... which shows the parallelism usage as 39.9 % only and its utilizing only 1 thread with maximum time went in the step - group.subgroupdone almost 98%.

            My Question is that even I have started the server with gdk_nr_threads=8 , then why it is just using single thread.

            More info --

            mserver5 --version

            MonetDB 5 server v11.19.7 "Oct2014-SP1" (64-bit, 64-bit oids)

            Copyright (c) 1993-July 2008 CWI

            Copyright (c) August 2008-2014 MonetDB B.V., all rights reserved

            Visit http://www.monetdb.org/ for further information

            Found 63.0GiB available memory, 8 available cpu cores

            Libraries:

                libpcre: 7.8 2008-09-05 (compiled with 7.8)

                openssl: OpenSSL 1.0.1e 11 Feb 2013 (compiled with OpenSSL 1.0.1e-fips 11 Feb 2013)

                libxml2: 2.7.6 (compiled with 2.7.6)

            Compiled by: root@localhost (x86_64-unknown-linux-gnu)

            Compilation: gcc -O3 -fomit-frame-pointer -pipe  -D_FORTIFY_SOURCE=2

            Linking    : /usr/bin/ld -m elf_x86_64

            Any pointers ??

            Thanks



            _________________________________________________
            users-list mailing list
            users-list@monetdb.org <mailto:users-list@monetdb.org>
            https://www.monetdb.org/__mailman/listinfo/users-list <https://www.monetdb.org/mailman/listinfo/users-list>


        _________________________________________________
        users-list mailing list
        users-list@monetdb.org <mailto:users-list@monetdb.org>
        https://www.monetdb.org/__mailman/listinfo/users-list <https://www.monetdb.org/mailman/listinfo/users-list>





_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list


_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list