Mark,
I think there is a bug in the implementation of the PYTHON_MAP aggregate functions.
The return value of a PYTHON_MAP aggregate function can change if there are other functions in the query.
The return value of my function weighted_percentile_0 (definition below) changes if I include the function median() in my query. The results are incorrect if median() isn't in the query, but are correct if median() is in the query. In this test case, the weights (v2) are all 1 and the values (v1) are uniformly sampled integers from [1,10]. Number of rows for each fctr is ~1000, and my function should reduce to min(v1) in this case.
Is some information bleeding between threads? Should I submit a bug report? This is on the default branch.
Thanks,
Dave
sql>select fctr,weighted_percentile_0(v1,v2),min(v1) from mini where fctr < 10 group by fctr order by fctr;
+------+--------------------------+------+
| fctr | L1 | L2 |
+======+==========================+======+
| 1 | 3 | 1 |
| 2 | 2 | 1 |
| 3 | 2 | 1 |
| 4 | 1 | 1 |
| 5 | 3 | 1 |
| 6 | 3 | 1 |
| 7 | 2 | 1 |
| 8 | 1 | 1 |
| 9 | 2 | 1 |
+------+--------------------------+------+
9 tuples (126.928ms)
sql>select fctr,weighted_percentile_0(v1,v2),min(v1),median(v1) from mini where fctr < 10 group by fctr order by fctr;
+------+--------------------------+------+------+
| fctr | L1 | L2 | L3 |
+======+==========================+======+======+
| 1 | 1 | 1 | 5 |
| 2 | 1 | 1 | 5 |
| 3 | 1 | 1 | 5 |
| 4 | 1 | 1 | 5 |
| 5 | 1 | 1 | 5 |
| 6 | 1 | 1 | 5 |
| 7 | 1 | 1 | 5 |
| 8 | 1 | 1 | 5 |
| 9 | 1 | 1 | 5 |
+------+--------------------------+------+------+
9 tuples (519.195ms)
sql>
CREATE AGGREGATE weighted_percentile_0(a DOUBLE, w DOUBLE)
RETURNS DOUBLE
LANGUAGE PYTHON_MAP {
import numpy as np
# Standardize and sort based on values in a
q = np.array([0]) / 100.0
idx = np.argsort(a)
a_sort = a[idx]
w_sort = w[idx]
# Get the cumulative sum of weights
ecdf = np.cumsum(w_sort)
# Find the percentile index positions associated with the percentiles
p = q * (w_sort.sum() - 1)
# Find the bounding indices (both low and high)
idx_low = np.searchsorted(ecdf, p, side='right')
idx_high = np.searchsorted(ecdf, p + 1, side='right')
idx_high[idx_high > ecdf.size - 1] = ecdf.size - 1
# Calculate the weights
weights_high = p - np.floor(p)
weights_low = 1.0 - weights_high
# Extract the low/high indexes and multiply by the corresponding weights
x1 = np.take(a_sort, idx_low) * weights_low
x2 = np.take(a_sort, idx_high) * weights_high
wp = np.add(x1,x2)
return(wp[0])
};
hi, monetdb does not support SUM() OVER commands.. does anyone have a
smart alternative to implement this? thanks
SELECT
batch_id,
job_count,
SUM(job_count) OVER (ORDER BY duration) as cumjobs,
SUM(job_count) over () as totjobs,
duration
FROM
test_data ;
Hello,
I have been using monetdb starting my db farm with monetdbd and it has always worked fine.
I have tried to start manually mserver5 with the command mserver5 --set embedded_py=true --dbpath=/path/to/db --daemon=yes —set mapi_open=true
The process starts correctly, but when I try to launch mclient it says InvalidCredentialsException:checkCredentials:invalid credentials for user ‘monetdb’, what can I do?
I want to start the database with mserver5 because I need remote access (—set mapi_open=true), is there a way to enable this flag using monetdbd?
Thanks,
Stefano
It seems that the functions of the Geom module is only accessible from the sys schema. That further implies that only users with access to the sys schema can call Geom functions.
What is the best way to ensure that all users can access the functions from the Geom module? Geom types seem to work fine.
Thanks,
Vince
The information transmitted, including any attachments, is intended only for the individual or entity to which it is addressed, and may contain confidential and/or privileged information. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by individuals or entities other than the intended recipient is prohibited, and all liability arising therefrom is disclaimed. If you have received this communication in error, please delete the information from any computer and notify the sender.
Hi there,
when I run 'count' on remote table I get a segfault on worker
server: 'could not find L1.L1 NOT NULL as L1 L1.L1
Segmentation fault'
Here is master(server/client) and worker (server/client), all are living on
localhost (as it was shown in example
https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/DistributedQuery…
).
master client
[akravchenko@hnode1 ~]$ mclient -d mst
Welcome to mclient, the MonetDB/SQL interactive terminal (Jun2016-SP1)
Database: MonetDB v11.23.7 (Jun2016-SP1), 'mst'
Type \q to quit, \? for a list of available commands
auto commit mode: on
sql>CREATE REMOTE TABLE t1 (i int) on 'mapi:monetdb://localhost:50001/rt1';
operation successful (1.357ms)
sql>\d
REMOTE TABLE sys.t1
sql>select * from t1;
+------+
| i |
+======+
| 11 |
| 13 |
+------+
2 tuples (16.205ms)
*sql>select count(*) from t1;*
*an error occurred on connection: Connection lost*
*sql>*
worker client
[akravchenko@hnode1 ~]$ mclient -d rt1 -p 50001
Welcome to mclient, the MonetDB/SQL interactive terminal (Jun2016-SP1)
Database: MonetDB v11.23.7 (Jun2016-SP1), 'rt1'
Type \q to quit, \? for a list of available commands
auto commit mode: on
sql>\d
TABLE sys.t1
sql>select count(*) from t1;
+------+
| L1 |
+======+
| 2 |
+------+
1 tuple (0.594ms)
worker server
[akravchenko@hnode1 ~]$ mserver5 --dbpath=rt1 --set mapi_port=50001
# MonetDB 5 server v11.23.7 "Jun2016-SP1"
# Serving database 'rt1', using 4 threads
# Compiled for x86_64-redhat-linux-gnu/64bit with 64bit OIDs and 128bit
integers dynamically linked
# Found 31.357 GiB available main-memory.
# Copyright (c) 1993-July 2008 CWI.
# Copyright (c) August 2008-2016 MonetDB B.V., all rights reserved
# Visit http://www.monetdb.org/ for further information
# Listening for connection requests on mapi:monetdb://127.0.0.1:50001/
# MonetDB/SQL module loaded
*>could not find L1.L1 NOT NULL as L1*
*L1.L1*
*Segmentation fault*
master server
[akravchenko@hnode1 ~]$ mserver5 --dbpath=mst
# MonetDB 5 server v11.23.7 "Jun2016-SP1"
# Serving database 'mst', using 4 threads
# Compiled for x86_64-redhat-linux-gnu/64bit with 64bit OIDs and 128bit
integers dynamically linked
# Found 31.357 GiB available main-memory.
# Copyright (c) 1993-July 2008 CWI.
# Copyright (c) August 2008-2016 MonetDB B.V., all rights reserved
# Visit http://www.monetdb.org/ for further information
# Listening for connection requests on mapi:monetdb://127.0.0.1:50000/
# MonetDB/SQL module loaded
Thank you,
Anton
Is this possible in MonetDB?
select * from database1.table1 t1, database2.table2 t2 where t1.col1 =
t2.col2
where database1 and database2 are two databases running simultaneously.
Hello,
I am trying to execute in parallel a few queries in MonetDB.
I have read in the documentation that Monet uses the OCC transaction model so read operations should not block the table.
I have 3 CREATE TABLE queries that read from the same source table, I launch 3 different mclient processes and launch one query on each client.
The queries execute in parallel as expected, I can see them running in sys.queue().
The first query that completes, creates the table and commits fine. When the other two queries finish their job I can see the message “operation successful” but right after the client prints "COMMIT: transaction is aborted because of concurrency conflicts, will ROLLBACK instead”.
What am I doing wrong? Apart from the newly created table, I did non write anything to the db so it should work fine.
Any suggestion would be much appreciated,
Thanks,
Stefano
I'm trying to run two MonetDB databases simoultaneously in Windows. I named
the two databases mydb and mydb2.
I changed the database name in M5server.bat to mydb, and added
-database=mydb to mclient.bat
Then I copied the M5server.bat file and changed the farm to mydb2. I also
copied the mclient.bat file and added -database=mydb2.
I run the two server bats to start the two databases and that seems to work
fine (no errors).
Then I run the first mclient.bat and it connects to the server with no
issues.
But when I run the copied mclient.bat and try to connect using
user/password monetdb/monetdb, I get the following error and cannot connect:
request for database 'mydb2', but this is database 'mydb', did you mean to
connect to monetdbd instead?
How to run two databases simoultaneously in Windows? How to connect to two
databases at the same time?
Hello,
I will briefly recap the problem I have already reported a few weeks ago: I am using monetdb default branch (I did not compile the stable version because I need embedded python) and I have a large table (~80 GB). A simple query like SELECT * FROM large_table WHERE field=x, fails.
mserver5 reports this error:
mserver5: gdk_select.c:867: fullscan_int: Assertion `cnt < (bn)->S.capacity' failed.
It seems that monet thinks it doesn’t have enough space to do a full scan of the table, maybe? (The machine has 800GB of free disk space and 30GB of RAM - Ubuntu 14.04).
I was using a default branch of december 2015, I have tried to use the latest default branch (4/08/16), recompiled, installed but nothing, always the same error.
So, as suggested, I have compiled the latest stable release (Jun2016-SP1). Now even a query which does not require monet to read the whole table (like SELECT * FROM large_table LIMIT 10 - This query worked fine in the default branch version) fails with ‘Program contains error’ message by client.
Selecting a single column of the table works just fine.
Moreover, with the stable release, the same issue arises with the second biggest table of the database (~35 GB).
[Here you can find the configuration of MonetDB: http://pastie.org/private/r8gtssg944aqwnigrysa]
All queries on all the other tables in the database (they are all much smaller than these two) work fine. I do not understand if the issue is in the size of table or somewhere else. How can I know what ‘error’ the message ‘program contains errors’ is referring to?
NOTE: At the beginning, using the default compiled branch everything worked just fine! Every query on my large_table worked as expected. Then I had to recreate the database from scratch so I deleted the dbfarm and created a new one. That’s when this headache started…
At this point I don’t know what to do, any help would be MUCH appreciated.
Thanks,
Stefano
Hi,
I try to synchronize data from mysql to monetdb.
Steps:
1. Read the modified rows from mysql table periodically.
2. Delete the rows in monetdb by primary key. (To avoid 'PRIMARY KEY constraint violated' error.) The sql is like 'delete from a where id in (1, 2, 3)'.
3. Insert the rows into monetdb by 'copy into ... from stdin'.
I find step 2 may be too slow. When there are 2 million rows, delete 10,000 rows takes about 1 minute. When there are 30 million rows, delete 10,000 rows takes about 20 minute.
I have to do join on the 30 million rows table, so I cannot split it.
I try to upgrade the server from 2 core 8 G memory to 8 core 32 G memory, but it has no effect. It seems the limitation is CPU, and the deleted operation can only use 1 core.
How can I optimize the performance? Is there any better approach to synchronize data?
Thanks in advance!
OS version: centos-release-7-0.1406.el7.centos.2.5.x86_64
mserver5 --version
MonetDB 5 server v11.23.7 "Jun2016-SP1" (64-bit, 64-bit oids, 128-bit integers)
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2016 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 31.0GiB available memory, 8 available cpu cores
Libraries:
libpcre: 8.32 2012-11-30 (compiled with 8.32)
openssl: OpenSSL 1.0.1e 11 Feb 2013 (compiled with OpenSSL 1.0.1e-fips 11 Feb 2013)
libxml2: 2.9.1 (compiled with 2.9.1)
Compiled by: root@jstu87d9w6zj4u (x86_64-unknown-linux-gnu)
Compilation: gcc -g -O2
Linking : /usr/bin/ld -m elf_x86_64
wang.tong(a)shomop.com