Hi there,
We have an issue about the remote table on monetdb version of 2016 (sp1
and sp2)
on Ubuntu 14.04 and on Windows Server2012 R2
The scenario for reproducing the issue with two nodes
on node1
Create a table on remote node and fill it:
CREATE TABLE test(id int not null, name text, valid boolean);
INSERT INTO test (id, name) VALUES (1, '1');
INSERT INTO test (id, name) VALUES (2, '2');
INSERT INTO test (id) VALUES (3);
on node2
CREATE REMOTE TABLE test(id int not null, name text, valid boolean) ON
'mapi:monetdb://node1:50000/dbfarm';
Then on node2:
select * from test;
+------+------+-------+
| id | name | valid |
+======+======+=======+
| 1 | 1 | null |
| 2 | 2 | null |
| 3 | null | null |
+------+------+-------+
It works fine, but:
select * from test where name is null;
+----+------+-------+
| id | name | valid |
+====+======+=======+
+----+------+-------+
id 3 should appear here. Furthermore:
select * from test where name is not
null;
(mapi:monetdb://monetdb@192.168.254.31/reports2) Cannot
register
project
(
select
(
table(sys.test) [ test.id NOT NULL, test.name, test.valid ]
COUNT
) [ clob "NULL" ! <= test.name ! <= clob "NULL"
]
) [ test.id NOT NULL, test.name, test.valid ] REMOTE
mapi:monetdb://.../...
select * from test where valid is null;
illegal input, not a JSON header (got '')
and node1 is crashed (need : monetdb start farm).
After downgrading on ubuntu 14.04 to version of 2015 (SP4) this
scenraio works fine.
Thanks,
SG
Mark,
I think there is a bug in the implementation of the PYTHON_MAP aggregate functions.
The return value of a PYTHON_MAP aggregate function can change if there are other functions in the query.
The return value of my function weighted_percentile_0 (definition below) changes if I include the function median() in my query. The results are incorrect if median() isn't in the query, but are correct if median() is in the query. In this test case, the weights (v2) are all 1 and the values (v1) are uniformly sampled integers from [1,10]. Number of rows for each fctr is ~1000, and my function should reduce to min(v1) in this case.
Is some information bleeding between threads? Should I submit a bug report? This is on the default branch.
Thanks,
Dave
sql>select fctr,weighted_percentile_0(v1,v2),min(v1) from mini where fctr < 10 group by fctr order by fctr;
+------+--------------------------+------+
| fctr | L1 | L2 |
+======+==========================+======+
| 1 | 3 | 1 |
| 2 | 2 | 1 |
| 3 | 2 | 1 |
| 4 | 1 | 1 |
| 5 | 3 | 1 |
| 6 | 3 | 1 |
| 7 | 2 | 1 |
| 8 | 1 | 1 |
| 9 | 2 | 1 |
+------+--------------------------+------+
9 tuples (126.928ms)
sql>select fctr,weighted_percentile_0(v1,v2),min(v1),median(v1) from mini where fctr < 10 group by fctr order by fctr;
+------+--------------------------+------+------+
| fctr | L1 | L2 | L3 |
+======+==========================+======+======+
| 1 | 1 | 1 | 5 |
| 2 | 1 | 1 | 5 |
| 3 | 1 | 1 | 5 |
| 4 | 1 | 1 | 5 |
| 5 | 1 | 1 | 5 |
| 6 | 1 | 1 | 5 |
| 7 | 1 | 1 | 5 |
| 8 | 1 | 1 | 5 |
| 9 | 1 | 1 | 5 |
+------+--------------------------+------+------+
9 tuples (519.195ms)
sql>
CREATE AGGREGATE weighted_percentile_0(a DOUBLE, w DOUBLE)
RETURNS DOUBLE
LANGUAGE PYTHON_MAP {
import numpy as np
# Standardize and sort based on values in a
q = np.array([0]) / 100.0
idx = np.argsort(a)
a_sort = a[idx]
w_sort = w[idx]
# Get the cumulative sum of weights
ecdf = np.cumsum(w_sort)
# Find the percentile index positions associated with the percentiles
p = q * (w_sort.sum() - 1)
# Find the bounding indices (both low and high)
idx_low = np.searchsorted(ecdf, p, side='right')
idx_high = np.searchsorted(ecdf, p + 1, side='right')
idx_high[idx_high > ecdf.size - 1] = ecdf.size - 1
# Calculate the weights
weights_high = p - np.floor(p)
weights_low = 1.0 - weights_high
# Extract the low/high indexes and multiply by the corresponding weights
x1 = np.take(a_sort, idx_low) * weights_low
x2 = np.take(a_sort, idx_high) * weights_high
wp = np.add(x1,x2)
return(wp[0])
};
Hi,
After reviewing all the other alternatives like SQL and Python UDF, I was
either stuck on performance with SQL UDF or on usability with Python UDF
(unable to use with aggregation, and not such great performance with
dates),
so I decided to go the hard way with C functions, as a bonus it will give
me the possibility to change the functionalities without worrying about
dependencies, which was not the case in other languages.
The purpose is to create a set of formatting functions for Year, Quarter,
Month, Week and Day brackets, and of course i need to create the bulk
version of each function for performance.
Starting from the MTIMEdate_extract_year_bulk, now i have the simple
function working, and successfully calling it from mclient:
*str*
*UDFyearbracket(str *ret, const date *v)*
*{*
* if (*v == date_nil) {*
* *ret = GDKstrdup(str_nil);*
* } else {*
* int year;*
* fromdate(*v, NULL, NULL, &year);*
* *ret = (str) GDKmalloc(15);*
* sprintf(*ret, "%d", year);*
* }*
* return MAL_SUCCEED;*
*}*
For the bulk version i get an error in the log: gdk_atoms.c:1345: strPut:
Assertion `(v[i] & 0x80) == 0' failed.
*str*
*UDFBATyearbracket(bat *ret, const bat *bid)*
*{*
* BAT *b, *bn;*
* BUN i,n;*
* str *y;*
* const date *t;*
* if ((b = BATdescriptor(*bid)) == NULL)*
* throw(MAL, "UDF.BATyearbracket", "Cannot access descriptor");*
* n = BATcount(b);*
* bn = COLnew(b->hseqbase, TYPE_str, BATcount(b), TRANSIENT);*
* if (bn == NULL) {*
* BBPunfix(b->batCacheid);*
* throw(MAL, "UDF.BATyearbracket", "memory allocation failure");*
* }*
* bn->tnonil = 1;*
* bn->tnil = 0;*
* t = (const date *) Tloc(b, 0);*
* y = (str *) Tloc(bn, 0);*
* for (i = 0; i < n; i++) {*
* if (*t == date_nil) {*
* *y = GDKstrdup(str_nil);*
* } else*
* UDFyearbracket(y, t);*
* if (strcmp(*y, str_nil) == 0) {*
* bn->tnonil = 0;*
* bn->tnil = 1;*
* }*
* y++;*
* t++;*
* }*
* BATsetcount(bn, (BUN) (y - (str *) Tloc(bn, 0)));*
* bn->tsorted = BATcount(bn)<2;*
* bn->trevsorted = BATcount(bn)<2;*
* BBPkeepref(*ret = bn->batCacheid);*
* BBPunfix(b->batCacheid);*
* return MAL_SUCCEED;*
*}*
PS: I am not a c expert but i can find my way with basic operations and
pointers.
Any help or suggestions is appreciated.
Thank you.
Hi ALL
I encountered a problem using a COPY INFO
The command to execute:
[hotel@02 data_visual_monetdb]$ mclient -u monetdb -d hotel-revenue -s "COPY 1727300 RECORDS INTO hotel_revenue_octopus_hotel_info FROM '/home/ping.gao/data_visual_monetdbinfo.data' USING DELIMITERS ',','\\n'<smb://n'>,'\"'"
password:
Failed to import table Leftover data '"0”'
Data Format(Line separator —> \n):
“test”,”test”,”test”,”test”
I’m trying to do automated incremental backups following these instructions:
https://www.monetdb.org/Documentation/UserGuide/FastDumpRestore
My question is whether why isn’t it enough to just stop the DB?
Is the waiting for open sessions a courtesy or is it required that no statements are executing prior to shutdown? Is it similar for the shutdown command? It gives people in a session time to exit or something?
Again, these are to be automated at a time of night that queries should not be being executed.
Bottom line question: is it enough to stop and lock a database to ensure the binary backup will have no errors that prevent the backup from being restored or are the other steps required?
Thanks,
Vince
The information transmitted, including any attachments, is intended only for the individual or entity to which it is addressed, and may contain confidential and/or privileged information. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by individuals or entities other than the intended recipient is prohibited, and all liability arising therefrom is disclaimed. If you have received this communication in error, please delete the information from any computer and notify the sender.
Hi,
I have a weird behavior when i package some sql code in a function, the
performance deteriorate.
The same code running directly in select statement is blinking fast.
ex:
*select date_to_str("Expiry_Date",'%Y-%m-%d') from table*: exec time is
0.72s
if i create a function:
*CREATE FUNCTION DayBracket(pdate date) returns char(10)*
*BEGIN*
* return date_to_str(pdate,'%Y-%m-%d');*
*end;*
*grant execute on function DayBracket to public;*
and execute *select DayBracket("Expiry_Date") from table*: exec time is 24s
even when i set the limit to 1 its taking the same time, so i guess it
should be something related to loading the function.
The following function perform much better:
*CREATE FUNCTION DayBracketOpt(pdate date) returns char(10)*
*BEGIN*
* return extract(year from pdate) || '-' || lpad(extract(month from
pdate),2,'0') || '-' || lpad(dayofmonth(pdate),2,'0');*
*end;*
*grant execute on function DayBracketOpt to public;*
I also tried to create a Python function:
*CREATE FUNCTION pyDayBracket(pdate date) returns string*
*LANGUAGE PYTHON {*
* from time import strftime, strptime*
* return [strftime("%Y-%m-%d", strptime(pdt,"%Y-%m-%d")) for pdt in pdate]*
*};*
it is taking too long and throwing an exception:buffer size mismatch
PS: the pdate array type is detected in pythons as string not as date;
Also how to delete a function with dependency, I need to replace some
functions which are used in others functions and views. Any flags to turn
off dependency check before dropping?
Thank you.
Dear MonetDB users,
The MonetDB team wishes you a very Merry Christmas and Happy New Year!
Thank you very much for helping us improving our work! Hope to see you all back in 2017, in which we’re of course going to continue our work, and come up with new MonetDB releases with more features, better performance and improved stability.
Best wishes,
Ying Zhang (on behalf of the MonetDB team)
Hi Guys,
I remember this somehow got fixed in the next release or 2 however i am not
sure how it got fixed and possibly via a different issue.
So it should be closed.
https://www.monetdb.org/bugzilla/show_bug.cgi?id=3878
Regards,
Brian Hood
The MonetDB team at CWI/MonetDB BV is pleased to announce the
Dec2016 feature release of the MonetDB suite of programs.
More information about MonetDB can be found on our website at
<http://www.monetdb.org/>.
For details on this release, please see the release notes at
<http://www.monetdb.org/Downloads/ReleaseNotes>.
As usual, the download location is <http://dev.monetdb.org/downloads/>.
Dec 2016 feature release
MonetDB5 Server
* Removed the zorder module with functions zorder.encode,
zorder.decode_x and zorder.decode_y.
* Removed command bbp.getHeat().
* Removed bat.setColumn with two arguments and bat.setRole. Use
bat.setColumn with one argument instead.
* Removed function BKCappend_reverse_val_wrap: it was unused.
* The "wrd" type has been removed from GDK and MAL. The type was
defined to be a 32 bit integer on 32 bit architectures and a 64 bit
integer on 64 bit architectures. We now generally use "lng" (always
64 bits) where "wrd" was used.
* Keep a collection of full traces. Each time the SQL user applies
the TRACE option, the full json trace is retained within the
<dbpath>/<dbname>/sql_traces
Build Environment
* New packages MonetDB-python2 (Fedora) and monetdb-python2
(Debian/Ubuntu) have been created for Python 2 integration into
MonetDB.
* With OID size equal to ABI/word size, mserver5 does not need to
print the OID size, anymore.
* Removed obsolete code associated with long gone static linking
option.
* Removed configure option --enable-oid32 to compile with 32 bit OIDs
on a 64 bit architecture.
.
* The Perl, PHP, and Python clients, and the JDBC driver each now
have their own repositories and release cycles. The Python client
is maintained by Gijs Molenaar on Github
(https://github.com/gijzelaerr/pymonetdb), the other clients are
maintained by CWI/MonetDB on our own server
(https://dev.monetdb.org/hg/monetdb-java,
https://dev.monetdb.org/hg/monetdb-perl,
https://dev.monetdb.org/hg/monetdb-php).
MonetDB Common
* The tnokey values must now be 0 if it is not known whether all
values in a column are distinct.
* The 2-bit tkey field in the bat descriptor has been split into two
single bit fields: tkey and tunique. The old tkey&BOUND2BTRUE value
is now stored in tunique.
* Implemented conversion to str from any type (not just the internal
types).
* VALcopy and VALinit both return their first argument on success or
(and that's new) NULL on (allocation) failure.
* BATattach now can also create a str BAT from a file consisting of
null-terminated strings. The input file must be encoded using
UTF-8.
* BATattach now copies the input file instead of "stealing" it.
* Removed the lastused "timestamp" from the BBP.
* Removed batStamp field from BAT descriptor, and removed the
BBPcurstamp function.
* Removed unused functions BBPhot and BBPcold.
* Removed BATderiveTailProps and BATderiveProps. Just set the
properties you know about, or use BATsettrivprop.
* Removed the macro BUNfirst. It can be replaced by 0.
* Changed BATroles by removing the argument to set the name of the
head column.
* The head column is now completely gone. MonetDB is completely
"headless".
* The format of the BBP.dir file was simplified. Since the head
column is VOID, the only value that needs to be stored is the head
seqbase.
* BATs now only have a single (logical) name.
* The function BATmirror is gone. The HEAD column is always VOID
(with a non-nil seqbase) and the TAIL column carries the data. All
functions that deal with data work on the TAIL column.
* BATkey now works on the TAIL column instead of the HEAD column.
* Replaced BATseqbase with BAThseqbase and BATtseqbase, the former
for setting the seqbase on the HEAD, the latter for setting the
seqbase on the TAIL.
* Replaced function BATnew with COLnew with slightly different
arguments: the first argument of COLnew is the SEQBASE of the head
column (which is always VOID).
* The "wrd" type has been removed from GDK and MAL. The type was
defined to be a 32 bit integer on 32 bit architectures and a 64 bit
integer on 64 bit architectures. We now generally use "lng" (always
64 bits) where "wrd" was used.
SQL
* Removed functions sys.zorder_encode, sys.zorder_decode_x, and
sys.zorder_decode_y.
* The experimental recycler code is moved to the attic.
* The syntax of bat.new(:oid,:any) has been changed by dropping the
superflous :oid. All BATs are now binary associations between a
void column and a materialized value column. (except for the
internal :bat[:void,:void] representation of simple oid ranged
tails.)
Bug Fixes
* 3357: Implement setQueryTimeout()
* 3445: Add support for database name to dotmonetdb file
* 3597: SQL to MAL listing looses types
* 3973: JDBC hangs
* 3976: Performance enhancement to LIKE without wildcards
* 4005: Correlated update causes incorrect null constraint violation
* 4016: merge table only optimises for point query
* 4040: sys.storage call can take a long time
* 4047: Segfault when updating a dropped table
* 4050: Database corruption when running low on inode
* 4057: missing bulk operations between constant and bat
* 4061: SIGSEGV in candscan_lng
* 4066: Deadlocked monetdbd
* 6068: Error message about incompatible BBP version should be
clearer
* 6069: query with union all silently crashes
* 6070: setting negative session query timeout should not be
possible/allowed
* 6071: where clause with cast and floor fails to sigsegv
* 6072: Bind to UPD delta column does not get/show type information
in EXPLAIN
* 6073: Missing type information for constants in MAL explain
* 6074: SET ROLE command does not work
* 6075: gdk_calc.c:13113: BATcalcifthenelse_intern: Assertion `col2
!= NULL' failed.
* 6076: rel_optimizer.c:5426: rel_push_project_up: Assertion `e'
failed.
* 6077: mserver5: rel_optimizer.c:5444: rel_push_project_up:
Assertion `e' failed.
* 6078: rel_bin.c:2402: rel2bin_project: Assertion `0' failed.
* 6084: Merge table point to wrong columns if columns in partition
tables are deleted
* 6108: monetdb5-sql sysv init script does not wait for shutdown
* 6114: segfault raised in the query rewriter due to a null pointer
* 6115: Assertion hit in the codegen
* 6116: Codegen does not support certain kind of selects on scalar
subqueries
* 6117: Assertion hit in the query rewriting stage during the push
down phase
* 6118: SIGSEGV in strPut due to shared heap
* 6119: Assertion hit in the MAL optimiser on a complex query
* 6120: QUANTILE() treats NULL as if it is zero
* 6121: SELECT a.col IN ( b.col FROM b ) FROM a statements with no
error but no result
* 6123: Assertion hit in the codegen #2
* 6124: CASE <column> WHEN NULL THEN 0 ELSE 1 END returns wrong
result
* 6125: Stack overflow in the query rewriter with a query having an
OR condition and a nested SELECT subexpression
* 6126: batcalc.== can't handle void BATs
* 6139: Debian libmonetdb13 conflicts with libmonetdb5-server-geom