Hi Manish
the real reason that multi-threaded execution is slower than single-threaded execution *in your very case* is the following:
You have ~10m rows and the result of your grouped-aggregation query has ~9.5M rows (groups) (i.e., most of the groups have only 1 or at most few members; the count of 10-16 in the top-10 you show suggests a (very) long tail of singleton groups ...).
Multi-threading in MonetDB essentially slices the largest base-table in as many slices as there are cores / threads to be used, and tries to replicate as much as possible of the query plan over these slices, before combining the partial results into a final single query result. This works fine and effectively in most cases. However, group-by queries may be a problem. With group-by queries, the grouping and aggregation is first done individually for each slice (in parallel). Then, to get the final result, the individual results are concatenated and grouped and aggregated in a single thread, again. This works fine in case there are (only) few groups. Then, the final ("extra") grouping/aggregation has to handle only few ("pre-aggregated") groups from the individual slices. However, with many groups, this extra grouping/aggregation can become expensive, in your particular case with almost as many groups as original input tuples/rows, this extra single-threaded grouping/aggregation is as expensive as the single group by/aggregation in the single-threaded case, essentially making the multi-threaded case almost twice as expensive as the single-threaded case. For the time being (?) MonetDB has no knowledge how many groups a group-by will produce, and thus cannot automatically decide when to run a group-by query or at least the/which group-by operation(s) in a query single-threaded or multi-threaded --- I'm afraid even ANALYZED does not produce sufficient statistics to derive from that the number of groups (possibly "deep" in the query plan).
In general, though, multi-threaded execution should indeed be faster than single-threaded; however, limited potential for parallelization (e.g., due to operators that cannot be parallelized (by MonetDB (yet?)) and thus implicitly serialize the remainder of the query execution plan) and/or resource congestion can limit the effect of multi- threaded execution, or even eliminate it, resulting in multi-threaded being not better than single-threaded. Multi-threaded becoming worse than single-threaded should be the rare exception, mainly in cases sketched above.
Hope this helps.
Best, Stefan
ps: to understand in detail why you mergetable on a single machine is slower than a single base table, you also might use TRACE and analyze (or share) the output ... I suspect the same large group by result problem as sketched above.
----- On Mar 23, 2017, at 2:47 AM, Manish gupta gahoimnshg@gmail.com wrote:
Hello Martin, Thanks for response. Is there any recommendation on what should be mode of running monetdb server? Given I will be dealing with 100 times bigger than this data ( but similar in nature ). Since the memory is sufficient ( 128GB physical ) to have relevant data in memory after first call. Does the decision depend on the properties of data? Should I first ANALYZE the data and check certain patterns to get the best usage of 16 cores that I will be having?
Regards, Manish
On Thu, Mar 23, 2017 at 6:35 AM, Martin Kersten < martin.kersten@cwi.nl > wrote:
No it doesn;t. See the traces, the plans are different. Blocking operators limit multithreaded work and at some point the cores fight over resources.
regards, Martin On 23/03/2017 01:54, Manish gupta wrote:
Dear All, Sorry for spamming. Can anyone suggest how single thread is faster than 4 threads through trace output I attached in previous append. I want to confirm that in general multi thread performance should be faster for monetdb, and it should scale with increasing cores.
Regards, Mohit.
On Tue, Mar 21, 2017 at 9:23 PM, Manish gupta < gahoimnshg@gmail.com <mailto: gahoimnshg@gmail.com >> wrote:
Hello Jennie, Thanks for responding. Here are the answers for the questions you had asked
- you can prefix your queries with the TRACE keyword to see where the time went
I am attaching Trace output. 4.trace is TRACE from 4 threads ( --set "gdk_nr_threads=4" ), 1.trace files is TRACE from 1 threads monet server run ( --set "gdk_nr_threads=1"). I was expecting 4 threads run to be ~2X faster, but its slower than 1 thread run.
- are the times from running the queries once, e.g. for the first time? have
you tried to run the queries repeatedly to see if there are any changes in the execution time? No, I am taking these snapshots after 3-4 times prior runs, so that the data is in memory, and no IO,
- are there any other applications running on your windows machine?
No, apart from Windows, there is no other application running on the machine.
- does your windows run in a VM or on physical hardware?
Its running on Physical hardware on my windows machine.
Please let me know if more info is required. In the meantime, I am trying to use tomograph utility.
Regards, Mohit
On Tue, Mar 21, 2017 at 1:25 PM, Ying Zhang < Y.Zhang@cwi.nl <mailto: Y.Zhang@cwi.nl >> wrote:
Hai Manish,
For the use case you described here, using merge table won’t help the performance, because, as Martin mentioned, MonetDB would already use all cores for the single base table in your merge table.
However, the performance degradation is a bit surprising. Some following up questions:
- you can prefix your queries with the TRACE keyword to see where the time went
- are the times from running the queries once, e.g. for the first time? have
you tried to run the queries repeatedly to see if there are any changes in the execution time? 3. are there any other applications running on your windows machine? 4. does your windows run in a VM or on physical hardware?
Regards, Jennie
On 19 Mar 2017, at 04:25, Manish gupta < gahoimnshg@gmail.com <mailto: gahoimnshg@gmail.com >> wrote:
Dear all, I am checking the performance of merge table vs single table on single machine with 4 cores. The fact table contains ~10million records. I am using # MonetDB 5 server v11.25.9 "Dec2016-SP2" on windows machine ( core i7 16GB physical ram ). I have created a merge table which contains fact table and 3 blank copies of fact table. I am seeing surprisingly slow performance of merge table queries compared to base table queries.
Base table query
sql>select sum(f_1_1_1), count(*), f_1_1_7,f_1_1_5 from t3760_279_1_1 group by f_1_1_7,f_1_1_5 order by sum(f_1_1_1) desc limit 10; +-----------+------+---------+---------+ | L2 | L5 | f_1_1_7 | f_1_1_5 | +===========+======+=========+=========+ | 125885351 | 14 | SALE | 8084869 | | 125629268 | 15 | SALE | 6852518 | | 121733416 | 16 | SALE | 5440605 | | 110328868 | 13 | SALE | 4842411 | | 100848236 | 11 | SALE | 8068509 | | 100719291 | 16 | SALE | 5822736 | | 93127875 | 11 | SALE | 6529612 | | 91034094 | 13 | SALE | 3401293 | | 90766181 | 10 | SALE | 8084900 | | 88848574 | 10 | SALE | 2660811 | +-----------+------+---------+---------+ 10 tuples (44.2s)
Merge Table Query
sql>select sum(f_1_1_1), count(*), f_1_1_7,f_1_1_5 from mt279_1_1 group by f_1_1_7,f_1_1_5 order by sum(f_1_1_1) desc limit 10; +-----------+------+---------+---------+ | L2 | L5 | f_1_1_7 | f_1_1_5 | +===========+======+=========+=========+ | 125885351 | 14 | SALE | 8084869 | | 125629268 | 15 | SALE | 6852518 | | 121733416 | 16 | SALE | 5440605 | | 110328868 | 13 | SALE | 4842411 | | 100848236 | 11 | SALE | 8068509 | | 100719291 | 16 | SALE | 5822736 | | 93127875 | 11 | SALE | 6529612 | | 91034094 | 13 | SALE | 3401293 | | 90766181 | 10 | SALE | 8084900 | | 88848574 | 10 | SALE | 2660811 | +-----------+------+---------+---------+ 10 tuples (1m 51s)
For the same results ( because merge table is containing the base table and 3 blank copies of base table ), merge table query is taking more than 2.5 times the base table runtime. Why is it so?
Thanks, Mohit
users-list mailing list users-list@monetdb.org <mailto: users-list@monetdb.org > https://www.monetdb.org/mailman/listinfo/users-list < https://www.monetdb.org/mailman/listinfo/users-list >
users-list mailing list users-list@monetdb.org <mailto: users-list@monetdb.org > https://www.monetdb.org/mailman/listinfo/users-list < https://www.monetdb.org/mailman/listinfo/users-list >
users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list