Bug 6661

Summary: Unusual behaviour when no_mitosis_pipe flag is set
Product: SQL Reporter: Jayanth Jain <jayanth037>
Component: allAssignee: SQL devs <bugs-sql>
Status: NEW ---    
Severity: major CC: lazumis, mk, shawpolakcrax12
Priority: Normal    
Version: 11.31.11 (Aug2018-SP1)   
Hardware: Other   
OS: Windows   
Attachments: The zip file to reproduce the issue
The zip file to reproduce the issue

Description Jayanth Jain 2018-11-19 15:59:19 CET
User-Agent:       Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36
Build Identifier: 

We observed an unusual behavior in MonetDB. We found that when SQL optimizer flag is set to no_mitosis_pipe, range queries perform much better than when the SQL optimizer is set to default-pipe. 
We tested it with 2 million random values and we got a result of 25ms for default_pipe compared to 1ms in no_mitosis_pipe.
We would expect that the default_pipe should be faster than the other pipeline, as it executed the query in parallel.
However, you can see that the opposite is true and the difference is quite significant.

We have attached a zip file which contains a python script issue.py, which creates a table of type varchar and adds two million random strings. 
Then, it executes the same range queries for both default-pipe and no_mitosis_pipe. 
Further, it generates default-Res and nomitosis-Res files which stores the result of default_pipe and no_mitosis_pipe respectively. 

We also noticed that as the number of duplicates increase, the execution time of default-pipe further increases. We tested it with 2 million records with only 110 unique values and got a result of 200ms for default-pipe compared to 1.5ms in no_mitosis_pipe.
This issue can also be tested by changing the arguments inputFile to 2000000dup and queryFile to QueryDup in issue.py python script.

The setup of monetDB is as follows:
MonetDB version: 11.32
Ubuntu version: 16.04
Installed from: self-installed and compiled
Script: issue.py (present in monetDBIssue.zip)
To test for unique values: Use 2000000sort as table data and QueryUnique as query file (present in monetDBIssue.zip)
To test for duplicate values: Use 2000000dup as table data and QueryDup as query file(present in monetDBIssue.zip)


Reproducible: Always

Steps to Reproduce:
1.Unzip the monetDBIssue.zip file
2.Execute the issue.py file inside monetDBIssue directory
3.grep for sql time from nomitosis-Res and default-Res file
default-Res execution time is more than nomitosis-Res file
Actual Results:  
The execution time of range queries in default_pipe is greater than no_mitosis_pipe.

Expected Results:  
The execution time of range queries in no_mitosis_pipe should be greater than default_pipe.
Comment 1 Jayanth Jain 2018-11-19 16:04:54 CET
Created attachment 608 [details]
The zip file to reproduce the issue
Comment 2 Jayanth Jain 2018-11-19 16:06:29 CET
Created attachment 609 [details]
The zip file to reproduce the issue
Comment 3 Sjoerd Mullender cwiconfidential 2018-11-19 16:32:30 CET
1. What is encchar1?  I see in file createTable that you try to create a table enctest with a column of type encchar1.

2. Did you accidentally attach the same file (monetDBIssue.zip) twice instead of attaching it once and some other file with the data 2000000sort and 2000000dup?
Comment 4 Jayanth Jain 2018-11-19 16:37:59 CET
(In reply to Sjoerd Mullender from comment #3)
> 1. What is encchar1?  I see in file createTable that you try to create a
> table enctest with a column of type encchar1.
> 
> 2. Did you accidentally attach the same file (monetDBIssue.zip) twice
> instead of attaching it once and some other file with the data 2000000sort
> and 2000000dup?

I tried to attach the data 2000000sort and 2000000dup. But it is bigger than 1000Kb. Please could you let me know how can I attach that file.
Comment 5 Sjoerd Mullender cwiconfidential 2018-11-19 17:14:42 CET
(In reply to Jayanth Jain from comment #4)
> I tried to attach the data 2000000sort and 2000000dup. But it is bigger than
> 1000Kb. Please could you let me know how can I attach that file.

You can't.  The database behind bugzilla (mariadb, I'm afraid) would not be happy.

Can you put it somewhere where I can download it from?
Comment 6 Jayanth Jain 2018-11-19 17:19:20 CET
(In reply to Sjoerd Mullender from comment #5)
> (In reply to Jayanth Jain from comment #4)
> > I tried to attach the data 2000000sort and 2000000dup. But it is bigger than
> > 1000Kb. Please could you let me know how can I attach that file.
> 
> You can't.  The database behind bugzilla (mariadb, I'm afraid) would not be
> happy.
> 
> Can you put it somewhere where I can download it from?

Ok. Thanks for the response. You can donwload the files from the following link  
 https://drive.google.com/drive/folders/1ZZM6buxYCHMtATINn7J_trtYdF43rI5R?usp=sharing . You can remove the enctest table. By mistake I have added it.
Comment 7 MonetDB Mercurial Repository cwiconfidential 2018-11-21 13:14:34 CET
Changeset 56bbe74324f4, made by Sjoerd Mullender <sjoerd@acm.org> in the MonetDB repo, refers to this bug.

For complete details, see https://dev.monetdb.org/hg/MonetDB?cmd=changeset;node=56bbe74324f4

Changeset description:

	Two small fixes to fix bug 6661.
	- Reduce time spent in micro sleep waiting for a BAT to become available.
	- Do not unload a BAT which has views on it.
Comment 8 Sjoerd Mullender cwiconfidential 2018-11-21 13:28:57 CET
I've made a couple of changes to improve the speed of the queries when run with the default pipeline.  It's still not as fast as the no_mitosis_pipe, but it's certainly better.
The biggest problem was that we were too eager to unload BATs from memory when they weren't being used.  The problem was, the BAT wasn't used for a very short time (basically the time between two instructions), so it was loaded again right away.  By being less eager, we now don't unload the BAT nearly as often in the default case which improved the speed considerably.
The other change was to poll more frequently when waiting for a BAT to be unloaded before loading it again (something you don't want to have to do at all, but isn't easy to avoid completely).

With or without these changes, running the query with the no_mitosis_pipe takes about 0.4 ms.
Without the changes, the query takes about 20 ms with the default_pipe.
With the changes, the time went down to 5 ms.
Comment 9 Jayanth Jain 2018-11-21 18:41:22 CET
(In reply to Sjoerd Mullender from comment #5)
> (In reply to Jayanth Jain from comment #4)
> > I tried to attach the data 2000000sort and 2000000dup. But it is bigger than
> > 1000Kb. Please could you let me know how can I attach that file.
> 
> You can't.  The database behind bugzilla (mariadb, I'm afraid) would not be
> happy.
> 
> Can you put it somewhere where I can download it from?

Ok. Thanks for the response. You can donwload the files from the following link  
 https://drive.google.com/drive/folders/1ZZM6buxYCHMtATINn7J_trtYdF43rI5R?usp=sharing . You can remove the enctest table. By mistake I have added it.(In reply to Sjoerd Mullender from comment #8)
> I've made a couple of changes to improve the speed of the queries when run
> with the default pipeline.  It's still not as fast as the no_mitosis_pipe,
> but it's certainly better.
> The biggest problem was that we were too eager to unload BATs from memory
> when they weren't being used.  The problem was, the BAT wasn't used for a
> very short time (basically the time between two instructions), so it was
> loaded again right away.  By being less eager, we now don't unload the BAT
> nearly as often in the default case which improved the speed considerably.
> The other change was to poll more frequently when waiting for a BAT to be
> unloaded before loading it again (something you don't want to have to do at
> all, but isn't easy to avoid completely).
> 
> With or without these changes, running the query with the no_mitosis_pipe
> takes about 0.4 ms.
> Without the changes, the query takes about 20 ms with the default_pipe.
> With the changes, the time went down to 5 ms.

Thank you for your response and for the fix. The fix mitigates the problem but doesn't fully solve the issue. Please let me know if you are planning to reduce the execution time of defaul_pipe further and bring it close to no_mitosis_pipe.
Comment 10 Martin Kersten cwiconfidential 2019-03-18 23:43:47 CET
Perhaps we can improve the speed by simply ignoring mitosis when any of the columns used is sorted or the first operation involves a range select.

Instead we could perform mitosis on the next largest table.
The ratio is that most tables are either first range selected or become the target of a join.
Comment 12 TobyCampos 2020-10-10 12:13:09 CEST
Bug catching from https://essaywriter.org/school-homework-help is the most difficult wok to do I life. You will not be able to look at the bug by just seeing the code. YOu firs have to dry your program then you would be able to catch the bug. Bugs are the most irritating thing of the code.