Hi there,
We have an issue about the remote table on monetdb version of 2016 (sp1
and sp2)
on Ubuntu 14.04 and on Windows Server2012 R2
The scenario for reproducing the issue with two nodes
on node1
Create a table on remote node and fill it:
CREATE TABLE test(id int not null, name text, valid boolean);
INSERT INTO test (id, name) VALUES (1, '1');
INSERT INTO test (id, name) VALUES (2, '2');
INSERT INTO test (id) VALUES (3);
on node2
CREATE REMOTE TABLE test(id int not null, name text, valid boolean) ON
'mapi:monetdb://node1:50000/dbfarm';
Then on node2:
select * from test;
+------+------+-------+
| id | name | valid |
+======+======+=======+
| 1 | 1 | null |
| 2 | 2 | null |
| 3 | null | null |
+------+------+-------+
It works fine, but:
select * from test where name is null;
+----+------+-------+
| id | name | valid |
+====+======+=======+
+----+------+-------+
id 3 should appear here. Furthermore:
select * from test where name is not
null;
(mapi:monetdb://monetdb@192.168.254.31/reports2) Cannot
register
project
(
select
(
table(sys.test) [ test.id NOT NULL, test.name, test.valid ]
COUNT
) [ clob "NULL" ! <= test.name ! <= clob "NULL"
]
) [ test.id NOT NULL, test.name, test.valid ] REMOTE
mapi:monetdb://.../...
select * from test where valid is null;
illegal input, not a JSON header (got '')
and node1 is crashed (need : monetdb start farm).
After downgrading on ubuntu 14.04 to version of 2015 (SP4) this
scenraio works fine.
Thanks,
SG
Mark,
I think there is a bug in the implementation of the PYTHON_MAP aggregate functions.
The return value of a PYTHON_MAP aggregate function can change if there are other functions in the query.
The return value of my function weighted_percentile_0 (definition below) changes if I include the function median() in my query. The results are incorrect if median() isn't in the query, but are correct if median() is in the query. In this test case, the weights (v2) are all 1 and the values (v1) are uniformly sampled integers from [1,10]. Number of rows for each fctr is ~1000, and my function should reduce to min(v1) in this case.
Is some information bleeding between threads? Should I submit a bug report? This is on the default branch.
Thanks,
Dave
sql>select fctr,weighted_percentile_0(v1,v2),min(v1) from mini where fctr < 10 group by fctr order by fctr;
+------+--------------------------+------+
| fctr | L1 | L2 |
+======+==========================+======+
| 1 | 3 | 1 |
| 2 | 2 | 1 |
| 3 | 2 | 1 |
| 4 | 1 | 1 |
| 5 | 3 | 1 |
| 6 | 3 | 1 |
| 7 | 2 | 1 |
| 8 | 1 | 1 |
| 9 | 2 | 1 |
+------+--------------------------+------+
9 tuples (126.928ms)
sql>select fctr,weighted_percentile_0(v1,v2),min(v1),median(v1) from mini where fctr < 10 group by fctr order by fctr;
+------+--------------------------+------+------+
| fctr | L1 | L2 | L3 |
+======+==========================+======+======+
| 1 | 1 | 1 | 5 |
| 2 | 1 | 1 | 5 |
| 3 | 1 | 1 | 5 |
| 4 | 1 | 1 | 5 |
| 5 | 1 | 1 | 5 |
| 6 | 1 | 1 | 5 |
| 7 | 1 | 1 | 5 |
| 8 | 1 | 1 | 5 |
| 9 | 1 | 1 | 5 |
+------+--------------------------+------+------+
9 tuples (519.195ms)
sql>
CREATE AGGREGATE weighted_percentile_0(a DOUBLE, w DOUBLE)
RETURNS DOUBLE
LANGUAGE PYTHON_MAP {
import numpy as np
# Standardize and sort based on values in a
q = np.array([0]) / 100.0
idx = np.argsort(a)
a_sort = a[idx]
w_sort = w[idx]
# Get the cumulative sum of weights
ecdf = np.cumsum(w_sort)
# Find the percentile index positions associated with the percentiles
p = q * (w_sort.sum() - 1)
# Find the bounding indices (both low and high)
idx_low = np.searchsorted(ecdf, p, side='right')
idx_high = np.searchsorted(ecdf, p + 1, side='right')
idx_high[idx_high > ecdf.size - 1] = ecdf.size - 1
# Calculate the weights
weights_high = p - np.floor(p)
weights_low = 1.0 - weights_high
# Extract the low/high indexes and multiply by the corresponding weights
x1 = np.take(a_sort, idx_low) * weights_low
x2 = np.take(a_sort, idx_high) * weights_high
wp = np.add(x1,x2)
return(wp[0])
};
hi, monetdb does not support SUM() OVER commands.. does anyone have a
smart alternative to implement this? thanks
SELECT
batch_id,
job_count,
SUM(job_count) OVER (ORDER BY duration) as cumjobs,
SUM(job_count) over () as totjobs,
duration
FROM
test_data ;
Hi all,
Has anyone had any luck building the sources with Python 3 ?
I tried building the Dec2016 branch from git, passing
--with-python3=/usr/bin/python3 --with-pyversion=/usr/bin/python3
--with-pyconfig=/usr/bin/python3-config
to configure, but received many errors during make under the pyapi
directory, such as.
/usr/include/python3.5m/object.h:86:5: error: missing braces around
initializer [-Werror=missing-braces]
{ _PyObject_EXTRA_INIT \
^
connection.c:159:5: note: in expansion of macro ‘PyObject_HEAD_INIT’
PyObject_HEAD_INIT(NULL)
^
/usr/include/python3.5m/object.h:86:5: note: (near initialization for
‘Py_ConnectionType.ob_base’)
{ _PyObject_EXTRA_INIT \
^
connection.c:159:5: note: in expansion of macro ‘PyObject_HEAD_INIT’
PyObject_HEAD_INIT(NULL)
^
connection.c:159:5: error: missing initializer for field ‘ob_size’ of
‘PyVarObject {aka struct <anonymous>}’ [-Werror=missing-field-initializers]
.................
connection.c:246:5: error: ‘return’ with a value, in function returning
void [-Werror]
import_array();
^
It looks like some of the macro definitions to make python3 header files
work with the pyapi source might be missing. I tried fixing some of the
sources like unicode.c, formatinput.c etc. But it looks like majority of
the source files have issues.
Thanks,
Joseph
Hello,
We’ve been battling for some time with what is fundamentally a simple query.
We have a databases with about 15 columns which store int and float values.
The query searches in each column with an IN condition , where the possible test values in the IN condition may vary from a 1 to 200 - where these 200 conditions are all the possible values of that column.
So normally a query would look something like:
SELECT id FROM table WHERE a IN (1,2,3,4) AND b IN (4,5,6,) AND c IN (2,5,6) ...... AND i = 5 etc.
The database has about 2 mil rows but will grow to potentially 10 mil rows.
A query similar to the one above takes about 80-100 ms. With about 50 test values for each IN.
So far, the best results we got with MariaDB and proper indexing using Engine Memory where the above query takes about 2ms.
We suspect MonetDB is not optimised for such queries, but since we don’t know much about its workings can you please confirm this? Otherwise, is there a way to improve the above query? Say using multiple JOINs instead of IN ?
By the way, we are experimenting with MonetDB out of curiosity, we heard a lot of good things about it.
Thank you very much,
Silviu
Hi,
I'm using MonetDB v11.23.13 (Jun2016-SP2). We decided to use MonetDB to
store DNA marker data for our plant breeding platform (www.kddart.org) 2
years ago based on MonetDB performance on data retrieval and data update is
also good. The use of MonetDB is not purely OLAP. There are lot of aspects
of OLTP because people need to load marker data from their genotyping
services. We finish integrating MonetDB into the platform and developing
more tools for the system. As the platform is getting more mature, we
started to do concurrency testing against the system including the part
that is relying on MonetDB.
Naively, I ran into a concurrency problem described in a master thesis by
Rene Tassy (Improving Transactional Scalability in MonetDB). The work done
in the thesis is very promising and I would like to thank you all for that.
My question in relation to how I would deal with my problem is that what is
the time frame for the improved MVCC Optimistic Concurrency Strategy in the
MonetDB release?
Regards,
Puthick
Hi!
We are trying to create a couple of functions in Monet. Every function we
create has a terrible bad performance.
Can someone help me to find out what Im doing wrong ?
Here is the function
CREATE FUNCTION sb_glob.camelcase(input VARCHAR(500))
RETURNS VARCHAR(500)
begin
DECLARE len INT;
DECLARE i INT;
SET len = CHAR_LENGTH(input);
SET input = LOWER(input);
SET i = 0;
WHILE (i < len) DO
SET input = CASE WHEN ((SUBSTR(input,i,1) = ' ' OR SUBSTR(input,i,1) = '-'
OR i = 0) AND (i < len)) THEN (LEFT(input,i) || UPPER(SUBSTR(input,i +
1,1)) || RIGHT(input,len - i - 1))
ELSE input END;
SET i = i + 1;
END WHILE;
RETURN input;
END
Thks in advance!
Hello all -
I'm having trouble pulling rows based on the value of a json key/value pair. I have created a table with a json object that looks like this:
sql>\d testjson
CREATE TABLE "sys"."testjson" (
"chr" INTEGER,
"pos" INTEGER,
"value" JSON
);
sql>
I have entries in the table that look as below:
sql>select * from testjson;
+------+-----------+----------------------------------------------------------------+
| chr | pos | value |
+======+===========+================================================================+
| 1 | 209890809 | {"df":1,"pval":4.556021744872574E-6,"r2":0.0995066009518681} |
| 1 | 789383847 | {"df":1,"pval":2.50962115178055E-6,"r2":0.10462833261259474} |
| 2 | 127893782 | {"df":1,"pval":4.2825829011938765E-6,"r2":0.10003907080878045} |
| 2 | 234533212 | {"df":2,"pval":4.556021744872574E-6,"r2":0.0995066009518681} |
+------+-----------+----------------------------------------------------------------+
4 tuples (1.027ms)
sql>
I would like to filter on json values. Not the json key, the json key's value. I can pull json values based on a non-json field, e.g. Grab the pval where chr>1 as below:
sql>select json.filter(value,'r2') as r2Value from testjson where chr=1;
+-----------------------+
| r2value |
+=======================+
| [0.0995066009518681] |
| [0.10462833261259474] |
+-----------------------+
2 tuples (1.179ms)
sql>
But I want to pull chr/pos where pval < some amount, or df > some number. For example, how would I write the query to select all rows where df > 1? This query should return the last row of the table above. I've tried the following:
sql>select * from testjson where (json.filter(value,'df') > 1);
types json(0,0) and tinyint(8,0) are not equal
sql>
Thanks for your help - Lynn
hi monetdb team,
i've pasted a minimal reproducible example of a few SQL queries that i had
expected to work -- only about half of them do work, and the other half
just error with 'ParseException:SQLparser:'
my goal is to get the very last line working, so if anybody could suggest a
hack or a workaround that doesn't involve a JOIN, i would appreciate it.
thanks all!!
CREATE TABLE table_one (this_column INTEGER) ;
INSERT INTO table_one VALUES (1) ;
INSERT INTO table_one VALUES (2) ;
INSERT INTO table_one VALUES (3) ;
INSERT INTO table_one VALUES (4) ;
INSERT INTO table_one VALUES (5) ;
CREATE TABLE table_two (this_column INTEGER) ;
INSERT INTO table_two VALUES (1) ;
INSERT INTO table_two VALUES (2) ;
# works as expected
SELECT SUM( ( table_one.this_column IN ( SELECT this_column FROM table_two
) ) ) FROM table_one ;
SELECT MAX( ( table_one.this_column IN ( SELECT this_column FROM table_two
) ) ) FROM table_one ;
# fails
SELECT ( table_one.this_column IN ( SELECT this_column FROM table_two ) )
AS new_column FROM table_one ;
# Server says 'ParseException:SQLparser:'.
# works
SELECT AVG( CASE WHEN table_one.this_column IN ( SELECT this_column FROM
table_two ) THEN 1 ELSE 0 END ) AS new_column FROM table_one ;
# fails
SELECT ( CASE WHEN ( table_one.this_column IN ( SELECT this_column FROM
table_two ) ) THEN 1 ELSE 0 END ) AS new_column FROM table_one ;
# Server says 'ParseException:SQLparser:'.
# fails [[this query is the one i would like to implement]]
SELECT COUNT(*) , ( CASE WHEN ( table_one.this_column IN ( SELECT
this_column FROM table_two ) ) THEN 1 ELSE 0 END ) AS new_column FROM
table_one GROUP BY new_column ;
# Server says 'ParseException:SQLparser:'.
Hello All,
Just wanted to report a bug, or at least unexpected behavior. I am new to
monetdb, and could not submit this to your bugzilla tracker.
*Background*
I successfully have used MonetDB with Docker (
https://www.monetdb.org/blog/docker-container-monetdb-r) and was able to
create FUNCTIONs and AGGREGATEs with R (as described here:
https://www.monetdb.org/content/embedded-r-monetdb). This blog post also
states:
"As a final note on usage, you can use any R package from CRAN within your
function without explicitly installing it. We overloaded the library function
in R to automatically install missing packages."
*Description*
However, I am not able to use external R packages available from CRAN. The
session below describes both Docker commands and SQL used. This session
was run on OSX using Docker Machine.
Thank you,
-Cas
*bash-3.2$* docker ps
CONTAINER ID IMAGE COMMAND
CREATED STATUS PORTS NAMES
d6b944445237 monetdb/monetdb-r-docker:latest
"/usr/bin/supervisord" 13 minutes ago Up 13 minutes
0.0.0.0:32768->50000/tcp monetdb-r-docker
*bash-3.2$* docker exec -it monetdb-r-docker mclient db
user:monetdb
password:
Welcome to mclient, the MonetDB/SQL interactive terminal (Jul2015-SP4)
Database: MonetDB v11.21.19 (Jul2015-SP4), 'mapi:monetdb://d6b944445237:
50000/db'
Type \q to quit, \? for a list of available commands
auto commit mode: on
# Create and use a function to show that MonetDB with basic R expressions
works:
*sql>*CREATE FUNCTION test_add() RETURNS DOUBLE LANGUAGE R {
1 + 1 };
operation successful (1.254ms)
*sql>*SELECT test_add();
+--------------------------+
| test_add |
+==========================+
| 2 |
+--------------------------+
# Add only the library function call to the function just shown (and give
the function a new name). The function is created successfully, but an
error occurs when the function is invoked in a query:
*sql>*CREATE FUNCTION test_library() RETURNS DOUBLE LANGUAGE R {
library("ggplot2"); 1 + 1 };
operation successful (1.378ms)
*sql>*SELECT test_library();
Error running R expression. Error message: Error in
.library.original(package, help, pos, lib.loc, character.only = T, :
there is no package called 'ggplot2'
Calls: as.data.frame -> <Anonymous> -> library -> .library.original
*sql>*