Mark,
I think there is a bug in the implementation of the PYTHON_MAP aggregate functions.
The return value of a PYTHON_MAP aggregate function can change if there are other functions in the query.
The return value of my function weighted_percentile_0 (definition below) changes if I include the function median() in my query. The results are incorrect if median() isn't in the query, but are correct if median() is in the query. In this test case, the weights (v2) are all 1 and the values (v1) are uniformly sampled integers from [1,10]. Number of rows for each fctr is ~1000, and my function should reduce to min(v1) in this case.
Is some information bleeding between threads? Should I submit a bug report? This is on the default branch.
Thanks,
Dave
sql>select fctr,weighted_percentile_0(v1,v2),min(v1) from mini where fctr < 10 group by fctr order by fctr;
+------+--------------------------+------+
| fctr | L1 | L2 |
+======+==========================+======+
| 1 | 3 | 1 |
| 2 | 2 | 1 |
| 3 | 2 | 1 |
| 4 | 1 | 1 |
| 5 | 3 | 1 |
| 6 | 3 | 1 |
| 7 | 2 | 1 |
| 8 | 1 | 1 |
| 9 | 2 | 1 |
+------+--------------------------+------+
9 tuples (126.928ms)
sql>select fctr,weighted_percentile_0(v1,v2),min(v1),median(v1) from mini where fctr < 10 group by fctr order by fctr;
+------+--------------------------+------+------+
| fctr | L1 | L2 | L3 |
+======+==========================+======+======+
| 1 | 1 | 1 | 5 |
| 2 | 1 | 1 | 5 |
| 3 | 1 | 1 | 5 |
| 4 | 1 | 1 | 5 |
| 5 | 1 | 1 | 5 |
| 6 | 1 | 1 | 5 |
| 7 | 1 | 1 | 5 |
| 8 | 1 | 1 | 5 |
| 9 | 1 | 1 | 5 |
+------+--------------------------+------+------+
9 tuples (519.195ms)
sql>
CREATE AGGREGATE weighted_percentile_0(a DOUBLE, w DOUBLE)
RETURNS DOUBLE
LANGUAGE PYTHON_MAP {
import numpy as np
# Standardize and sort based on values in a
q = np.array([0]) / 100.0
idx = np.argsort(a)
a_sort = a[idx]
w_sort = w[idx]
# Get the cumulative sum of weights
ecdf = np.cumsum(w_sort)
# Find the percentile index positions associated with the percentiles
p = q * (w_sort.sum() - 1)
# Find the bounding indices (both low and high)
idx_low = np.searchsorted(ecdf, p, side='right')
idx_high = np.searchsorted(ecdf, p + 1, side='right')
idx_high[idx_high > ecdf.size - 1] = ecdf.size - 1
# Calculate the weights
weights_high = p - np.floor(p)
weights_low = 1.0 - weights_high
# Extract the low/high indexes and multiply by the corresponding weights
x1 = np.take(a_sort, idx_low) * weights_low
x2 = np.take(a_sort, idx_high) * weights_high
wp = np.add(x1,x2)
return(wp[0])
};
hi, monetdb does not support SUM() OVER commands.. does anyone have a
smart alternative to implement this? thanks
SELECT
batch_id,
job_count,
SUM(job_count) OVER (ORDER BY duration) as cumjobs,
SUM(job_count) over () as totjobs,
duration
FROM
test_data ;
Hi,
As per MonetDB Solution suggestion, we are asking here about this topic.
We are studying how MonetDB store its database on disk.
So far we were able to understand pretty much all the files (BAT,
journal, BBP.dir, heap, hash, imprints, and so on). We have custom tools
to decode and dump them (experimental at this point).
There is one thing however that is unclear: we thought that "delta BAT",
which are the BAT assigned to an SQL tables to list the OIDs of rows
that were deleted (named "D_<schema>_<table>" internaly), would contains
only *uniques* values (since you can never delete twice the same row).
But on several databases that we are running, we found that some of
theses BATs contain duplicates. Lot of duplicates actually. Especially
on D_sys__columns and D_sys__tables BATs (respectively for the
sys._columns and sys._tables system tables). Some databases do not have
this "issue" (while they all have the same schema, and process the same
kind of data as the other ones).
Can someone explain if duplicates are expected in theses BATs ?
Here is an excerpt of D_sys__tables:
# hexdump -C 02/210.tail
00000000 2e 00 00 00 00 00 00 00 2f 00 00 00 00 00 00 00 |......../.......|
00000010 30 00 00 00 00 00 00 00 31 00 00 00 00 00 00 00 |0.......1.......|
00000020 32 00 00 00 00 00 00 00 33 00 00 00 00 00 00 00 |2.......3.......|
00000030 34 00 00 00 00 00 00 00 35 00 00 00 00 00 00 00 |4.......5.......|
00000040 36 00 00 00 00 00 00 00 37 00 00 00 00 00 00 00 |6.......7.......|
00000050 42 00 00 00 00 00 00 00 43 00 00 00 00 00 00 00 |B.......C.......|
00000060 44 00 00 00 00 00 00 00 45 00 00 00 00 00 00 00 |D.......E.......|
00000070 46 00 00 00 00 00 00 00 47 00 00 00 00 00 00 00 |F.......G.......|
00000080 48 00 00 00 00 00 00 00 49 00 00 00 00 00 00 00 |H.......I.......|
00000090 4a 00 00 00 00 00 00 00 4b 00 00 00 00 00 00 00 |J.......K.......|
000000a0 4c 00 00 00 00 00 00 00 4d 00 00 00 00 00 00 00 |L.......M.......|
000000b0 4e 00 00 00 00 00 00 00 4f 00 00 00 00 00 00 00 |N.......O.......|
000000c0 50 00 00 00 00 00 00 00 50 00 00 00 00 00 00 00 |P.......P.......|
000000d0 51 00 00 00 00 00 00 00 50 00 00 00 00 00 00 00 |Q.......P.......|
000000e0 51 00 00 00 00 00 00 00 52 00 00 00 00 00 00 00 |Q.......R.......|
000000f0 50 00 00 00 00 00 00 00 51 00 00 00 00 00 00 00 |P.......Q.......|
00000100 52 00 00 00 00 00 00 00 53 00 00 00 00 00 00 00 |R.......S.......|
00000110 50 00 00 00 00 00 00 00 51 00 00 00 00 00 00 00 |P.......Q.......|
00000120 52 00 00 00 00 00 00 00 53 00 00 00 00 00 00 00 |R.......S.......|
...
You can clearly see duplicates OID (the first one being 0x50 at 0xc8).
Its entry in BBP.dir (split into sections):
136 32 tmp_210 tmpr_210 02/210 610523782 2 171807 0 0 171807 172032 0 0 0 0
void 0 1 1793 0 0 0 0 0 1000651 0 0 0
oid 8 0 1024 24 25 27 1 46 773603235 1374456 1376256 1
We are using: MonetDB 5 server v11.21.14 (64-bit, 64-bit oids, 128-bit integers).
--
Frédéric Jolliton
Sécuractive
Hello,
I am creating a table from the result of a python function. One column is composed of double values, some of them are np.inf and are translated in monet as inf
I cannot find a way to query for these values, how does monetdb handle infinity?
Moreover, if I later do some query on this table and apply an arithmetic operation to the inf values, monetdb throws ‘overflow in calculation’ error, I would expect any operation (except for special cases) to result in inf. Is this a known issue or an intended design?
Regards,
Stefano
For those who are interested, here is (attached) the "php_mapi.inc"
which I changed to a php class (OOP)
Feel free to use or even upload to any repository.
Regards.
The MonetDB team at CWI/MonetDB BV is pleased to announce the
Jun2016-SP1 bugfix release of the MonetDB suite of programs.
More information about MonetDB can be found on our website at
<http://www.monetdb.org/>.
For details on this release, please see the release notes at
<http://www.monetdb.org/Downloads/ReleaseNotes>.
As usual, the download location is <http://dev.monetdb.org/downloads/>.
Jun 2016-SP1 bugfix release
.
* Lots of memory leaks have been plugged across the whole system.
Java Module
* Corrected PROCEDURE_TYPE output value of method
DatabaseMetaData.getProcedures(). It used to return
procedureReturnsResult. Now it returns procedureNoResult. Corrected
ORDINAL_POSITION output value of method
DatabaseMetaData.getProcedureColumns(). It used to start with 0,
but as procedures do not return a result value it now starts with 1
for all the procedure arguments, as defined by the JDBC API.
* Improved output of method DatabaseMetaData.getProcedures(). The
REMARKS column now contains the procedure definition as stored in
sys.functions.func. The SPECIFIC_NAME column now contains the
procedure unique identifier as stored in sys.functions.id. This
allows the caller to retrieve the specific overloaded procedure
which has the same name, but different arguments. Also improved
output of method DatabaseMetaData.getProcedureColumns(). The
SPECIFIC_NAME column now contains the procedure unique identifier
as stored in sys.functions.id. This allows the caller to retrieve
the proper arguments of the specific overloaded procedure by
matching the SPECIFIC_NAME value.
* Improved output of method DatabaseMetaData.getFunctions(). The
REMARKS column now contains the function definition as stored in
sys.functions.func. The SPECIFIC_NAME column now contains the
function unique identifier as stored in sys.functions.id. This
allows the caller to retrieve the specific overloaded function
which has the same name, but different arguments. Also improved
output of method DatabaseMetaData.getFunctionColumns(). The
SPECIFIC_NAME column now contains the function unique identifier as
stored in sys.functions.id. This allows the caller to retrieve the
proper arguments of the specific overloaded function by matching
the SPECIFIC_NAME value.
Bug Fixes
* 4014: KILL signal
* 4021: Analyze query does not escape input [security]
* 4026: JDBC driver incorrectly converts TINYINT fields to String
instead of an integer type.
* 4028: inputs not the same size
* 4031: mclient doesn't accept - argument to refer to stdin
* 4032: no decimal places after update. ODBC driver
* 4035: SQL Function call bug
* 4036: Possible sql_catalog corruption due to unclean backuped tail
Hi,
in the process of writing a MAL backend for a query compiler, I'm trying
to optimize the join order of arbitrary theta joins on whole relations
(i.e. predicates may be arbitrary). To implement this, I use a simple
heuristic for 2-way theta joins on relations: First the cheap predicates
are identified, then the necessary joins are performed on the columns.
After that all results of the cheap predicates are merged via joins on
the OIDs. Finally, expensive predicates (i.e. with a lot of qualifying
tuples) are implemented by applying a binary scalar operation and using
a *subselect* and a *projection*/*leftfetchjoin*, since all columns are
aligned now. To conclude: 2-way joins may not be atomic MAL joins (i.e.
not column-wise), but they very well could be.
So, what I'm looking for is some facility in MonetDB, with which I am
able to break down a n-way join into 2-way joins, with the smallest
possible intermediate results, without assuming too much about how those
2-way joins are implemented. The first thing I found was the *scheduler*
module, which is also on the official website (
https://www.monetdb.org/Documentation/Manuals/MonetDB/Optimizers/Memoization
). But that seems no more up to date, since most of the functions are
missing and the semantics of join operators in MAL has changed quite a
bit. Is there anything else I can use for this task (aside from
implementing some dynamic programming algorithm within MAL)?
Kind regards,
Moritz
All,
I was planning on loading a shared library within either an R or Python function inside a MonetDB server. Is this supported? What kinds of constraints do I need to worry about? Guessing any memory problems will impact the server, but not sure about thread safety etc.
Thanks,
Dave
Hi!
We are facing a problem with MERGE Tables.
We have a process in php which connects to the DB using the mapi (
https://www.monetdb.org/Documentation/Manuals/SQLreference/Programming/MAPI
<https://www.google.com/url?q=https%3A%2F%2Fwww.monetdb.org%2FDocumentation%…>
)
The process creates a new table everyday as follows:
CREATE TABLE cub8_20160717 AS (SELECT * FROM cub8) WITH NO DATA;
Then, attaches that new table to the MERGE table called cub8_cubo as
follows:
ALTER TABLE cub8_cubo ADD TABLE cub8_20160717 ;
Once all this is created, it start inserting data.
Everything works fine, with NO errors. BUT if I run a query against
cub8_cubo I dont see cub8_20160717 data.
So I run this query in order to check if the new table was attached to the
merge:
SELECT a.name, b.name, s.name FROM sys.tables as a JOIN sys.dependencies d
ON a.id = d.depend_id JOIN sys.tables as b ON b.id= d.id JOIN sys.schemas
as s ON a.schema_id = s.id WHERE a.type= 3 AND a.name ILIKE '%cub8%' ORDER
BY 1,2;
+-----------+---------------+
| name | name |
+===========+===============+
| cub8_cubo | cub8_20160710 |
| cub8_cubo | cub8_20160711 |
| cub8_cubo | cub8_20160712 |
| cub8_cubo | cub8_20160713 |
| cub8_cubo | cub8_20160714 |
| cub8_cubo | cub8_20160715 |
| cub8_cubo | cub8_20160716 |
| cub8_cubo | cub8_20160717 |
So I try to detached from cub8_cubo, with the command:
ALTER TABLE cub8_cubo DROP TABLE cub8_20160717;
ALTER TABLE: table 'sb_traf.cub8_20160717' isn't part of the MERGE TABLE
'sb_traf.cub8_cubo'
So I ADD it again:
ALTER TABLE cub8_cubo ADD TABLE cub8_20160717;
and now yes... I can see the data. If I need to detach that table, I need
to run ALTER ... DROP twice or three times to get it done.
It seems to be a bug?, are we doing something wrong ??
Thks in advance!
*PS.:We are using *
*MonetDB Database Server v1.7 (Jun2016)*
*Welcome to mclient, the MonetDB/SQL interactive terminal (Jun2016)*